Storage through iterative dna editing

ABSTRACT

Information is stored in existing DNA through an iterative process of creating a break in dsDNA and adding new DNA by repairing the break with a homologous repair template. The order and sequence of DNA sequences added to the breaks in the dsDNA can encode binary data. By using a context-dependent encoding scheme, three unique homologous repair templates can encode an unbounded number of bits. When the existing DNA is in a cell, the changes are heritably passed to subsequent generations of the cell. Synthesis of the homologous repair templates may be under the control of a promoter and operator. Intra- or extra-cellular signals may regulate the synthesis of homologous repair templates.

RELATED APPLICATIONS

This application is a division of U.S. patent application Ser. No.15/623,925 filed on Jun. 15, 2017 which claims the benefit of U.S.Provisional Patent Application Ser. No. 62/357,828 filed on Jul. 1, 2016and U.S. Provisional Patent Application Ser. No. 62/399,190 filed onSep. 23, 2016 which are all incorporated herein by reference.

BACKGROUND

The ability to store and retrieve arbitrary information within existingdeoxyribose nucleic acids (DNA) such as the DNA in cells is an importantgoal for synthetic biology that has the potential to create a low-cost,high-density storage medium for binary data. To date, existing storagemethods within cells have been able to record only up to a few bytes ofinformation, but require specialized, orthogonal genetic constructs foreach bit, greatly limiting the maximum storage capacity.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter nor is it intended tobe used to limit the scope of the claimed subject matter.

Precise gene editing techniques such as CRISPR/Cas (Clustered regularlyinterspaced short palindromic repeats/CRISPR associated protein) systemand TALEN (transcription activator-like effector nucleases) enablesmanipulation of DNA in ways that can be used to store arbitraryinformation or state information of a cell. The methods described hereinare capable of dynamically recording an unbounded amount of binary datawithin a live cell by incremental editing of cellular DNA using geneediting techniques. The information may be transduced from any intra- orextra-cellular signal and is permanently recorded in the DNA in a mannerthat allows the information to be passed to subsequent cellulargenerations. This method may also be used in cell-free or synthetic cellsystems.

DNA in a cell, or in another environment such as a cell free system, maybe cut to create a double strand break (“DSB”). New DNA may be insertedinto the break using homology directed repair (HDR). Thoughtful designof a target site in the original DNA as well as design of homologousrepair templates makes it possible to repeatedly cut the DNA and add newDNA within the cut thereby adding DNA within an existing strand of DNAin a way that encodes binary data or other information.

The sequence of additions may encode 0s and 1s allowing for storage ofarbitrary binary data in a way that is heritably passed to new cellsduring cell division. Signaling pathways may be used to control theinsertion of specific homologous repair templates and/or availability ofspecific enzymes in order to direct the cell to record either a 0 or a1.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is set forth with reference to the accompanyingfigures. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears. Theuse of the same reference numbers in different figures indicates similaror identical items.

FIG. 1 shows a schematic representation of cutting dsDNA with an enzymeand inserting new DNA by homology directed repair.

FIG. 2 shows a schematic representation of cutting the dsDNA of FIG. 1and inserting additional DNA by homology directed repair.

FIG. 3 shows an illustrative process for cutting dsDNA, inserting DNAthrough homology directed repair, and sequencing the DNA to identify theencoded binary data.

FIG. 4 shows an illustrative process for inserting DNA into twolocations in dsDNA according to two different encoding schemes.

FIG. 5 show illustrative components of a cell for inserting new DNA intoexisting DNA.

FIG. 6 shows an illustrative system for implementing features of thisdisclosure.

DETAILED DESCRIPTION

There are many techniques for encoding binary data in the sequence of anucleic acid. Most of those techniques are based on synthesizing newnucleic acids and strive for a compact encoding in which only a fewnucleic acid bases are needed to encode a 1 or 0. However, the techniquedescribed in this disclosure adds new DNA sequences in the middle of anexisting DNA to encode binary data. Due to the constraints of working onexisting DNA that involves manipulation with enzymes, the presence ofbinding sites, etc. the encoding is not as compact as it could be in asystem that uses newly synthesized DNA.

The technique of this disclosure stores data in DNA by repeatedlycutting and inserting a new sequence into the existing DNA. Each insertprovides within the insert a target site for the next round of DNAcutting and subsequent insertion. Repeating this process creates a DNAmolecule with a series of nested inserts. The order of the nestedinserts may be interpreted as encoding a series of 1s and 0s.

By “hybridizable” or “complementary” or “substantially complementary” itis meant that a nucleic acid (e.g. DNA) comprises a sequence ofnucleotides that enables it to non-covalently bind, to another nucleicacid in a sequence-specific, antiparallel, manner (i.e., a nucleic acidspecifically binds to a complementary nucleic acid) under theappropriate in vitro and/or in vivo conditions of temperature andsolution ionic strength. As is known in the art, standard Watson-Crickbase-pairing includes: adenine (A) pairing with thymidine (T), adenine(A) pairing with uracil (U), and guanine (G) pairing with cytosine (C).In addition, it is also known in the art that for hybridization betweentwo RNA molecules (e.g., dsRNA), guanine (G) base pairs with uracil (U).In the context of this disclosure, a guanine (G) is consideredcomplementary to a uracil (U), and vice versa. As such, when a G/Ubase-pair can be made at a given nucleotide position a protein-bindingsegment (dsRNA duplex) of a subject DNA-targeting RNA molecule, theposition is not considered to be non-complementary, but is insteadconsidered to be complementary.

Hybridization and washing conditions are well known and exemplified inSambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: ALaboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press,Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1therein; and Sambrook, J. and Russell, W., Molecular Cloning: ALaboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press,Cold Spring Harbor (2001). The conditions of temperature and ionicstrength determine the “stringency” of the hybridization.

Hybridization requires that the two nucleic acids contain complementarysequences, although mismatches between bases are possible. Theconditions appropriate for hybridization between two nucleic acidsdepend on the length of the nucleic acids and the degree ofcomplementation, variables well known in the art. The greater the degreeof complementation between two nucleotide sequences, the greater thevalue of the melting temperature (T_(m)) for hybrids of nucleic acidshaving those sequences. For hybridizations between nucleic acids withshort stretches of complementarity (e.g. complementarity over 35 orless, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or lessnucleotides) the position of mismatches becomes important (see Sambrooket al., supra, 11.7-11.8). Typically, the length for a hybridizablenucleic acid is at least about 10 nucleotides. Illustrative minimumlengths for a hybridizable nucleic acid are: at least about 15nucleotides; at least about 20 nucleotides; at least about 22nucleotides; at least about 25 nucleotides; and at least about 30nucleotides). Furthermore, the skilled artisan will recognize that thetemperature, pH, and wash solution salt concentration may be adjusted asnecessary according to factors such as length of the region ofcomplementation and the degree of complementation.

It is understood in the art that the sequence of polynucleotide need notbe 100% complementary to that of its target nucleic acid to bespecifically hybridizable or hybridizable. Moreover, a polynucleotidemay hybridize over one or more segments such that intervening oradjacent segments are not involved in the hybridization event (e.g., aloop structure or hairpin structure). A polynucleotide can comprise atleast 70%, at least 80%, at least 90%, at least 95%, at least 99%, or100% sequence complementarity to a target region within the targetnucleic acid sequence to which they are targeted. For example, anantisense nucleic acid in which 18 of 20 nucleotides of the antisensecompound are complementary to a target region, and would thereforespecifically hybridize, would represent 90 percent complementarity. Inthis example, the remaining noncomplementary nucleotides may beclustered or interspersed with complementary nucleotides and need not becontiguous to each other or to complementary nucleotides. Percentcomplementarity between particular stretches of nucleic acid sequenceswithin nucleic acids can be determined routinely using BLAST programs(basic local alignment search tools) and PowerBLAST programs known inthe art (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang andMadden, Genome Res., 1997, 7, 649-656) or by using the Gap program(Wisconsin Sequence Analysis Package, Version 8 for Unix, GeneticsComputer Group, University Research Park, Madison Wis.), using defaultsettings, which uses the algorithm of Smith and Waterman (Adv. Appl.Math., 1981, 2, 482-489).

FIG. 1 shows an illustrative schematic of operations to add a new DNAsequence into a double-stranded DNA (dsDNA) 100. The dsDNA 100 includesa target site 102 that directs an enzyme 104 to create a DSB in thedsDNA 100 within the target site 102. The DSB may be created with bluntends or with sticky ends depending on the specific enzyme and techniquefor making the DSB. The target site 102 is a sequence of DNA recognizedby an enzyme that creates DSBs in dsDNA. The target site 102 may beintentionally introduced into the dsDNA 100 to enable the manipulationsdescribed below. Alternatively, a pre-existing portion of the dsDNA 100may be selected as the target site 102. If a pre-existing portion of thedsDNA 100 is selected as the target site 102, then the sequence of othercomponents of the system will be designed with reference to the sequenceof the target site 102. In some implementations, the target site 102 isunique such that there is only one target site 102 in the entire dsDNAstrand. The dsDNA 100 may be genomic DNA inside a living prokaryotic oreukaryotic cell, DNA introduced to a living cell such as a plasmid orvector, or DNA in a cell free system. The dsDNA 100 may exist as eitherlinear or circular DNA prior to introduction of the DSB.

The enzyme 104 that creates the DSB may be any protein, protein-RNAcomplex, or protein-DNA complex (including multimeric complexes) thathas the property of creating a DSB in dsDNA at a specific target site.Non-limiting examples of suitable enzymes include restriction enzymes,homing endonucleases, zinc-finger nucleases (ZFNs), transcriptionactivator-like effector nucleases (TALENs), and CRISPR/Cas. These typesof enzymes listed above are all examples of site-specific nucleases thatare capable of causing a DSB within a target site.

Restriction enzymes (restriction endonucleases) are present in manyspecies and are capable of sequence-specific binding to DNA (at a targetor recognition site), and cleaving DNA at or near the site of binding.Over 3000 restriction enzymes have been studied in detail, and more than600 of these are available commercially. Naturally occurring restrictionendonucleases are categorized into four groups (Types I, II III, and IV)based on their composition and enzyme cofactor requirements, the natureof their target sequence, and the position of their DNA cleavage siterelative to the target sequence. All types of enzymes recognize specificshort DNA sequences and carry out the endonucleolytic cleavage of DNA togive specific fragments with terminal 5′-phosphates. One type ofrestriction enzyme, Type II enzymes, cleave within or at short specificdistances from a recognition site; most require magnesium; singlefunction (restriction) enzymes independent of methylase. Type II enzymesform homodimers, with recognition sites that are usually undivided andpalindromic and 4-8 nucleotides in length. They recognize and cleave DNAat the same site, and they do not use ATP or AdoMet for theiractivity—they usually require only Mg²⁺ as a cofactor. Common type IIrestriction enzymes include HhaI, HindIII, NotI, EcoRI, and BglI.Restriction enzymes may cut dsDNA in a way that leaves either blunt endsor sticky ends. Protocols for creating a DSB in dsDNA with restrictionenzymes are well known to those skilled in the art. Restriction digestis a common molecular biology technique and is typically performed usingthe reagents and protocols provided in a commercially availablerestriction digest kit. Examples of companies that provide restrictiondigest kits include New England BioLabs, Promega, Sigma-Aldrich, andThermo Fisher Scientific. Each of these companies provides restrictiondigest protocols on their website.

Homing endonucleases (HEs), which are also known as meganucleases, are acollection of double stranded DNases that have large, asymmetricrecognition sites (12-40 base pairs) and coding sequences that areusually embedded in either introns or inteins. Introns are spliced outof precursor RNAs, while inteins are spliced out of precursor proteins.They catalyze the hydrolysis of genomic DNA within the cells thatsynthesize them, but do so at very few, or even singular, locations. HErecognition sites are extremely rare. For example, an 18 base pairrecognition sequence will occur only once in every 7×10¹⁰ base pairs ofrandom sequence. This is equivalent to only one site in 20mammalian-sized genomes. However, unlike restriction endonucleases, HEstolerate some sequence degeneracy within their recognition sequence.Thus, single base changes do not abolish cleavage but reduce itsefficiency to variable extents. As a result, their observed sequencespecificity is typically in the range of 10-12 base pairs. Examples ofsuitable protocols using HEs may be found in Karen Flick et al., DNABinding in Cleavage by the Nuclear Introns-Encoded Homing EndonucleaseI-Ppol, 394 Nature 96-101 (1998) and Brett Chevalier et al., Design,Activity, and Structure of a Highly Specific Artificial Endonuclease, 10Molecular Cell 895-905 (2002).

Zinc finger nucleases (ZFNs) are synthetic proteins consisting of anengineered zinc finger DNA-binding domain fused to the cleavage domainof the FokI restriction endonuclease. ZFNs can be used to induce DSBs inspecific DNA sequences and thereby promote site-specific homologousrecombination and targeted manipulation of genomic loci in a variety ofdifferent cell types. The introduction of a DSB into dsDNA may enhancethe efficiency of recombination with an exogenously introducedhomologous repair template. ZFNs consist of a DNA-binding zinc fingerdomain (composed of three to six fingers) covalently linked to thenon-specific DNA cleavage domain of the bacterial FokI restrictionendonuclease. ZFNs can bind as dimers to their target DNA sites, witheach monomer using its zinc finger domain to recognize a half-site.Dimerization of ZFNs is mediated by the FokI cleavage domain whichcleaves within a five or six base pair “spacer” sequence that separatesthe two inverted “half sites.” Because the DNA-binding specificities ofzinc finger domains can in principle be re-engineered using one ofvarious methods, customized ZFNs can be constructed to target nearly anyDNA sequence. One of ordinary skill in the art will know how to designand use ZFNs to create DSBs in dsDNA at a desired target site. Somesuitable protocols are available in Philipsbom, A, et al., NatureProtocols, 2006, 1, 1322-1328; John Young and Richard Harland, TargetedGene Disruption with Engineered Zinc Finger Nucleases (ZFNs), 917Xenopus Protocols 129-141 (2012), and Hansen, K., et al. Genome Editingwith CompoZr Custom Zinc Finger Nucleases (ZFNs). J. Vis. Exp. 2012, 64,e3304, doi:10.3791/3304.

TALEN are restriction enzymes that can be engineered to cut specificsequences of DNA. They are made by fusing a TAL effector DNA bindingdomain to a DNA cleavage domain (a nuclease which cuts DNA strands).Transcription activator-like effectors (TALEs) can be engineered to bindpractically any desired DNA sequence, so when combined with a nuclease,DNA can be cut at specific locations. The restriction enzymes can beintroduced into cells, for use in gene editing or for genome editing insitu. The DNA binding domain contains a repeated highly conserved 33-34amino acid sequence with divergent 12^(th) and 13^(th) amino acids.These two positions, referred to as the Repeat Variable Diresidue (RVD),are highly variable and show a strong correlation with specificnucleotide recognition. This straightforward relationship between aminoacid sequence and DNA recognition has allowed for the engineering ofspecific DNA-binding domains by selecting a combination of repeatsegments containing the appropriate RVDs. Notably, slight changes in theRVD and the incorporation of “nonconventional” RVD sequences can improvetargeting specificity. One of ordinary skill in the art will know how todesign and use TALENs to create DSBs in dsDNA at a desired target site.Some suitable protocols are available in Marlo Hermann et al., MouseGenome Engineering Using Designer Nucleases, 86 J. Vis. Exp. e50930,doi:10.3791/50930 (2014) and Tetsuhi Sakuma et al., Efficient TALENConstruction and Evaluation Methods for Human Cell and AnimalApplications, 18(4) Genes Cells 315-326 (2013).

In the CRISPR/Cas nuclease system, the CRISPR locus, encodes RNAcomponents of the system, and the Cas (CRISPR-associated) locus, encodesproteins. CRISPR loci in microbial hosts contain a combination ofCRISPR-associated (Cas) genes as well as non-coding RNA elements capableof programming the specificity of the CRISPR-mediated nucleic acidcleavage.

The Type II CRISPR is one of the most well characterized systems andcarries out targeted DNA DSB in four sequential steps. First, twonon-coding RNA, the pre-crRNA array and tracrRNA, are transcribed fromthe CRISPR locus. Second, tracrRNA hybridizes to the repeat regions ofthe pre-crRNA and mediates the processing of pre-crRNA into maturecrRNAs containing individual spacer sequences. Third, the maturecrRNA:tracrRNA complex directs Cas9 to the target DNA via Watson-Crickbase-pairing between the spacer on the crRNA and the protospacer on thetarget DNA next to the protospacer adjacent motif (PAM), an additionalrequirement for target recognition. The Cas9 species of differentorganisms have different PAM sequences. For example, Streptococcuspyogenes (Sp) has a PAM sequence of 5′-NGG-3′, Staphylococcus aureus(Sa) has a PAM sequence of 5′-NGRRT-3′ or 5′-NGRRN-3′, Neisseriameningitidis (NM) has a PAM sequence of 5′-NNNNGATT-3′, Streptococcusthermophilus (St) has a PAM sequence of 5′-NNAGAAW-3′, Treponemadenticola (Td) has a PAM sequence of 5′-NAAAAC-3′.

Finally, Cas9 mediates cleavage of target DNA to create adouble-stranded break within the protospacer. Activity of the CRISPR/Cassystem comprises three steps: (i) insertion of alien DNA sequences intothe CRISPR array to prevent future attacks, in a process called‘adaptation,’ (ii) expression of the relevant proteins, as well asexpression and processing of the array, followed by (iii) RNA-mediatedinterference with the alien nucleic acid. Thus, in the bacterial cell,several of the so-called ‘Cm’ proteins are involved with the naturalfunction of the CRISPR/Cas system and serve roles in functions such asinsertion of the alien DNA, etc.

CRISPR may also function with nucleases other than Cas9. Two genes fromthe Cpf1 family contain a RuvC-like endonuclease domain, but they lackCas9's second HNH endonuclease domain. Cpf1 cleaves DNA in a staggeredpattern and requires only one RNA rather than the two (tracrRNA andcrRNA) needed by Cas9 for cleavage. Cpf1's preferred PAM is 5′-TTN,differing from that of Cas9 (3′-NGG) in both genomic location andGC-content. Mature crRNAs for Cpf1-mediated cleavage are 42-44nucleotides in length, about the same size as Cas9's, but with thedirect repeat preceding the spacer rather than following it. The Cpf1crRNA is also much simpler in structure than Cas9's; only a shortstem-loop structure in the direct repeat region is necessary forcleavage of a target. Cpf1 also does not require an additional tracrRNA.Whereas Cas9 generates blunt ends 3 nt upstream of the PAM site, Cpf1cleaves in a staggered fashion, creating a 5 nucleotide 5′ overhang18-23 bases away from the PAM.

There are also CRISPR/Cas9 variants that do not use a PAM sequence suchas NgAgo. NgAgo functions with a 24-nucleotide ssDNA guide and isbelieved to cut 8-11 nucleotides from the start of this sequence. ThessDNA is loaded as the protein folds and cannot be swapped to adifferent guide unless the temperature is increased to non-physiological55° C. A few nucleotides in the target DNA are removed near the cutsite. Techniques for using NgAgo are described in Feng Gao et al.,DNA-guided Genome Editing Using the Natronobacterium Gregoryi Argonaute,34 Nature Biotechnology 768-770 (2016).

DSBs may be formed by making two single-stranded breaks at differentlocations creating a cut DNA molecule with sticky ends. Single-strandbreaks or “nicks” may be formed by modified versions of the Cas9 enzymecontaining only one active catalytic domain (called “Cas9 nickase”).Cas9 nickases still bind DNA based on gRNA specificity, but nickases areonly capable of cutting one of the DNA strands. Two nickases targetingopposite strands are required to generate a DSB within the target DNA(often referred to as a “double nick” or “dual nickase” CRISPR system).This requirement dramatically increases target specificity, since it isunlikely that two off-target nicks will be generated within close enoughproximity to cause a DSB.

In certain embodiments, any of the enzymes mentioned above may be a“functional derivative” of a naturally occurring protein. A “functionalderivative” of a native sequence polypeptide is a compound having aqualitative biological property in common with a native sequencepolypeptide. “Functional derivatives” include, but are not limited to,fragments of a native sequence and derivatives of a native sequencepolypeptide and its fragments, provided that they have a biologicalactivity in common with a corresponding native sequence polypeptide. Abiological activity contemplated herein is the ability of the functionalderivative to hydrolyze a DNA substrate into fragments. The term“derivative” encompasses both amino acid sequence variants ofpolypeptide, covalent modifications, and fusions thereof. Suitablederivatives of an enzyme or a fragment thereof include but are notlimited to mutants, fusions, covalent modifications of the protein or afragment thereof. The enzyme, or a fragment thereof, as well asderivatives or a fragment thereof, may be obtainable from a cell orsynthesized chemically or by a combination of these two procedures. Thecell may be a cell that naturally produces the enzyme. A cell thatnaturally produces enzyme may also be genetically engineered to producethe endogenous enzyme at a higher expression level or to produce theenzyme from an exogenously introduced nucleic acid, which nucleic acidencodes an enzyme that is the same or different from the endogenousenzyme. In some cases, a cell does not naturally produce the enzyme andis genetically engineered to produce the enzyme.

After creating a DSB in the target site 102, the target site 102 issplit into two subsequences 102(A), 102(B) on either side of the DSB.Each of the two subsequences 102(A), 102(B) may be between 5 and 20nucleotides in length. Thus, the target site 102 may be between 10 and40 nucleotides in length. In some implementations, the two subsequences102(A), 102(B) may contain identical DNA sequences. The DSB may belocated in the middle of the target site 102 or it may be locatedelsewhere within the target site 102. The schematic shown in FIG. 1illustrates a DSB with blunt ends, but as described above DSBs withsticky ends are also covered within the scope of this disclosure.

A homologous repair template 106 is brought into proximity of the dsDNA100 with the DSB. The homologous repair template 106 is single strand(ss) DNA or ssRNA. The homologous repair template 106 includes a 3′-endsequence 108 complementary to the first subsequence of the target site102(A) and a 5′-end sequence 110 complementary to a second subsequenceof the target site 102(B). Because they are complementary sequences, thelength of the 3′-end sequence 108 and the 5′-end sequence 110 are thesame or about the same as the respective subsequences of the target site102(A), 102(B). Thus, both 3′-end sequence 108 and the 5′-end sequence110 may be between 5 and 20 nucleotides in length. The middle portion ofthe homologous repair template 106 contains a region 112 encoding asecond target site 116. This middle region 112 may contain twosubsequences 112(A), 112(B) on either side of the point where the secondtarget site 116 will be cut by a second enzyme. The length of the twosubsequences 112(A), 112(B) in the middle portion 112 of the homologousrepair template 106 may be different than the lengths of the twosubsequences 102(A), 102(B) but may follow the same size range and bebetween five and 20 nt in length. Thus, the total length of thehomologous repair template 106 may be between about 20 and 80 nt.Because the middle region 112 encodes a second target site 116, thehomologous repair template 106 itself provides the basis for thisprocess to be repeated iteratively.

The homologous repair template 106 repairs the DSB throughhomology-directed repair (HDR). HDR is a mechanism in cells to repairdouble strand DNA lesions. HDR includes homologous recombination (HR)and single-strand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem.79:181-211). The most common form of HDR is HR which has the longestsequence homology requirements between the donor and acceptor DNA. Otherforms of HDR include single-stranded annealing (SSA) andbreakage-induced replication, and these require shorter sequencehomology relative to HR. Homology-directed repair at nicks(single-stranded breaks) can occur via a mechanism distinct from HDR atdouble-strand breaks (Davis and Maizels. PNAS (0027-8424), 111 (10), p.E924-E932).

An overabundance of the homologous repair template 106 may be providedto increase efficiency of HDR. The overabundance of homologous repairtemplate 106 may be provided to a cell free system by adding additionalcopies of the ssRNA or ssDNA manually or with the use of microfluidics.The homologous repair template 106 may also be provided, inoverabundance if desired, by placing a gene encoding for the homologousrepair template 106 under control of a strong promoter and/or by havingmultiple copies of the gene encoding the homologous repair template 106under the control of the same promoter.

The 5′-ended DNA strand is resected at the break to create a 3′overhang. This will serve as both a substrate for proteins required forstrand invasion and a primer for DNA repair synthesis. The homologousrepair template 106 can then displace one strand of the homologous DNAduplex and pair with the other; this causes formation of hybrid DNAreferred to as the displacement loop (“D loop”). The recombinationintermediates can then be resolved to complete the DNA repair process.As mentioned above, an overabundance of the homologous repair template106 may be provided. One of ordinary skill in the art will understandhow to perform HDR with dsDNA 100 having a DSB and a homologous repairtemplate 106. Possible protocols for performing HDR are provided in JieLiu et al., In Vitro Assays for DNA Pairing in Recombination AssociatedDNA Synthesis, 745 Methods Mol. Bio. 363-383 (2011); Gratz, S, et al.,196 Genetics 967-971 (2014); Richardson, C, et al., 34 NatureBiotechnology 399-344 (2016); and Lin, S, et al., eLIFE, 2014, 3:e04766.

After the homologous repair template 106 invades the dsDNA the D loop isformed by hybridization of the 3′-end sequence 108 to the firstsubsequence 102(A) of the target site 102 and hybridization of the5′-end sequence 110 to the second subsequence 102(B) of the target site102. DNA polymerase synthesizes new ssDNA 114 complementary to themiddle portion 112 of one strand of the dsDNA 100. DNA ligase joins thesugar-phosphate backbone of the newly synthesized ssDNA 114 with theremainder of that strand of the dsDNA 100. This forms one strand of thesecond target site 116.

Following repair of the first strand of the dsDNA 100, the second strandof the dsDNA 100 is repaired by DNA polymerase and DNA ligase using thesequence of the new ssDNA 114 in the repaired, first strand as atemplate. This completes the repair of the dsDNA 100 resulting in dsDNAthat includes the second target site 116 inserted within the firsttarget site 102.

DNA polymerases are enzymes that synthesize DNA molecules fromindividual deoxyribonucleotides. During this process, DNA polymerase“reads” an existing DNA strand to create a new, complementary strand.DNA ligase is a specific type of enzyme, a ligase, that facilitates thejoining of DNA strands together by catalyzing the formation of aphosphodiester bond. It plays a role in repairing single-strand breaks.The mechanism of DNA ligase is to form two covalent phosphodiester bondsbetween 3′ hydroxyl ends of one nucleotide, (“acceptor”) with the 5′phosphate end of another (“donor”). The DNA ligase from bacteriophage T4is the ligase most-commonly used in laboratory research. It can ligatecohesive or “sticky” ends of DNA, oligonucleotides, as well as RNA andRNA-DNA hybrids, but not single-stranded nucleic acids. It can alsoligate blunt-ended DNA.

Note that the homologous repair template 106 includes two types ofregions: end regions and a middle region. The end regions are homologousto one of the strands of the dsDNA 100 on either side of the DSB. Here,the homologous regions are shown by the 3′-end sequence 108 and the5′-end sequence 110. The homology need not be 100% but only to theextent that the 3′-end sequence 108 and the 5′-end sequence 110hybridize to one strand of the dsDNA 100. The middle region is themiddle portion 112 of the homologous repair template 106 that encodesthe sequence of the second target site 116. Independently varying boththe end regions and the middle region allows for creation of multipledifferent homologous repair templates 106 from a relatively limited setof end regions and middle regions.

Following HDR the dsDNA 100 includes the first subsequence 102(A) of thefirst target site 102 followed by the first subsequence 116(A) of thesecond target site 116. The DNA sequence 118 represented by this orderof the two subsequences 102(A), 116(A) of the two target sitesrepresents a binary digit (e.g., 0 or 1). As mentioned above, a lengthof the subsequence 102(A) is from five to 20 nucleotides and the lengthof the subsequence 112(A) is also from five to 20 nucleotides. Thus, thebit represented by the DNA sequence 118 is encoded by a sequence of 10to 40 nucleotides. More precisely, the bit is encoded by a firstsequence of five to 20 nucleotides being adjacent to a second sequenceof five to 20 nucleotides rather than by the specific identity of eachof the five to 20 nucleotides.

FIG. 2 shows schematic illustrations of further manipulations performedon the dsDNA 100 molecule of FIG. 1. The second enzyme 200 creates asecond DSB in the second target site 116. The second target site 116 hasa different sequence than the first target site 102, and thus, thesecond enzyme 200 recognizes a different DNA sequence than the firstenzyme 104. Creating a DSB in the second target site 116 creates thefirst subsequence 116(A) of the second target site 116 on one side ofthe DSB and a second subsequence 116(B) of the second target site 116 onthe other side of the DSB. In some implementations, the firstsubsequence 116(A) and the second subsequence 116(B) may have the samesequence. Thus, the first subsequence 116(A) and a second subsequence116(B) may have the same nucleotide length. Also, if the firstsubsequence 116(A) and the second subsequence 116(B) are the samesequence, the second target site 116 may be thought of as having asingle subsequence repeated once.

A second homologous repair template 202 contacts the dsDNA 100 toprovide a template for HDR of the DSB. The second homologous repairtemplate 202 includes a 3′-end region 204 that is homologous to onestrand of the dsDNA 100 within the first subsequence 116(A) of thesecond target site 116. The second homologous repair template 202 alsoinclude a 5′-end region 206 that is homologous to one strand of thedsDNA 100 within the second subsequence 116(B) of the second target site116. The second homologous repair template 202 also includes a portionin the middle 208 that encodes a third target site 212 for a thirdenzyme. The middle region 208 includes a first subsequence 208(A) on oneside of the DSB and a second subsequence 208(B) on other side of theDSB.

Annealing of the second homologous repair template 202 to one strand ofthe dsDNA 100 creates a D loop by hybridization of the 3′-end sequence204 to the subsequence 116(A) and hybridization of the 5′-end sequence206 to the subsequence 116(B). DNA polymerase and DNA ligase repair thestrand of the dsDNA 100 to which the second homologous repair template202 is hybridized by creating new DNA 210. The second strand of thedsDNA 100 is then repaired using the first strand as a template.

The dsDNA 100 now includes the third target site 212 inserted into themiddle of the second target site 116 (which is itself inserted in themiddle of the first target site 102). The order of the subsequence116(A) followed by the subsequence 212(A) encodes a second binary digit214. This process can repeat to encode any number of binary digits 118,214 within the dsDNA 100.

The encoding scheme described herein allows for insertion of DNAsequences representing an unbounded length of bits using only threedifferent target sequences and six different homologous repair templatesas explained below. The three target sequences are represented as X₁X₂,Y₁Y₂, and Z₁Z₂. The first portion of the target sequences (e.g., X₁, Y₁,or Z₁) corresponds to subsequence 102(A) or subsequence 116(A) shown inFIG. 1. The remaining portion of the target sequences (e.g., X₂, Y₂, orZ₂) corresponds to subsequence 102(B) or subsequence 116(B) shown inFIG. 1. Thus, each X, Y, and Z represents a DNA sequence of about 5 to20 nucleotides such as, for example only, ACTGAA, GCCTCAT, TGACG, etc.In some implementations X₁=X₂, etc., but in other implementations thefirst portion of a target sequence may be different in sequence and/orlength from the remaining portion of a target sequence.

The homologous repair templates all have end regions that are homologousto one of the target sequences. Thus, the homologous repair templateswill have sequences of the structure: X₁_ _X₂, Y₁_ _Y₂, and Z₁_ _Z₂.Recall that the middle region of the homologous repair templates itselfencodes a target site. Thus, the middle region for any given homologousrepair template will be one of X₁X₂, Y₁Y₂, or Z₁Z₂. In order toprecisely control location of insertion, homologous repair templates donot encode the target site into which the homologous repair template isto be inserted. For example, X₁X₁X₂X₂ is not a valid homologous repairtemplate according to this encoding. Thus, if the target sequence is “X”the middle region of the homologous repair template may encode thetarget sequence for “Y” or “Z”; if the target sequence is “Y” the middleregion may encode “X” or “Z”; if the target sequence is “Z” the middleregion may encode “X” or “Y”. This leads to the six homologous repairtemplates: X₁Y₁Y₂X₂, X₁Z₁Z₂X₂, Y₁X₁X₂Y₂, Y₁Z₁Z₂Y₂, Z₁X₁X₂Z₂, andZ₁Y₁Y₂Z₂.

A context-dependent encoding using three target sites and six homologousrepair templates is shown in Table 1 below. This is only one possibleencoding and other encodings that use a greater number of target sitesand homologous repair templates are also possible. Moreover, it is alsopossible to use multiple encodings in the same cell or system.

TABLE 1 Context-dependent binary encoding Current State Repair TemplateEncoded Bit X₁X₂ X₁Y₁Y₂X₂ 0 X₁X₂ X₁Z₁Z₂X₂ 1 Y₁Y₂ Y₁X₁X₂Y₂ 0 Y₁Y₂Y₁Z₁Z₂Y₂ 1 Z₁Z₂ Z₁X₁X₂Z₂ 0 Z₁Z₂ Z₁Y₁Y₂Z₂ 1

The current state represents the sequence present in the dsDNA 100 atthe time a given DSB is created. The current state may be tracked by acomputer that is provided with a record of the initial target site 102of the dsDNA 100 and with the sequences of each homologous repairtemplate 106, 202 as the respective templates are brought into contactwith the dsDNA 102. Thus, by referencing the current state as stored inthe computer, the appropriate homologous repair template can be selectedfrom Table 1 (or similar table for a different encoding) in order toencode the desired next bit.

Current state may also be recorded biochemically by using a geneticallydesigned bi-stable switch. With one or more bi-stable switches, thecurrent state may be recorded biochemically by creating molecularrecords based on which enzyme and/or homologous repair template was usedlast through a positive feedback loop. Recall the example encoding fromabove that makes it possible to encode any binary sequence through acombination of using three enzymes (i.e. one for each target site) andsix homologous repair templates. In some implementations, genes encodingeach of the enzymes and each of the homologous repair templates may alsobe present in the cell and each of the genes may be regulated byspecific, and known promoters. The genes that make up a given enzyme andaccompany regulatory elements may be included in one or more operons. Anoperon is a contiguous region of DNA that includes cis-regulatoryregions (repressors, promoters) and the coding regions for one or moregenes or functional mRNAs (siRNA, tracrRNA, gRNA, shRNA, etc). Theoperon may be delivered in a circular vector or may be inserted into alinear chromosome. Activation of the genes by upregulation or cessationof suppression, may increase the amount of the desired enzyme and/orhomologous repair template. This may also generate tracking moleculesthat can be used to monitor state. The tracking molecules may be part ofthe operons.

A “repressor” (and/or “knockdown”) is a protein or mRNA (small hairpinloops (shRNA), interfering mRNA (RNAi or siRNA)) that binds to DNA/RNAand blocks either attachment of the promoter, blocks elongation of thepolymerase during transcription, or blocks mRNA from translation.

The current level of a tracking molecule can serve as an on/offregulation signal for a gene encoding a given operon. In a system with abi-stable repressor, the repressor has two states 0/1 which get flippedafter each operation. For example, after the homologous repair templateX₁Y₁Y₂X₂ is used (e.g., as identified by the concentration becominggreater than a threshold level) an associated tracking molecule may setthe state of the bi-stable repressor. Continuing with this example, eachof the homologous repair templates may be associated with a differentbi-stable repressor and at any given time five of the bi-stablerepressors may be in a state associated with “off” and the sixth may bein a state associated with “on.” Thus, by examination of the state ofmultiple bi-stable repressors it is possible to identify whichhomologous repair template was used last. A similar mechanism may keeptrack of which enzyme was used last.

To avoid potential interference by molecules remaining from an earlieriteration, flipping of the bi-stable repressor may be handled as amultistage process that first pauses until editing of the dsDNA hasstopped, then switches the state of the repressor using a temporallydecaying signal indicating which state the repressor should change to.The temporally decaying signal is initiated during the active stage ofthe last iteration. Once the signal has decayed below a threshold leveland the repressor has fully switched into the new state, operonsregulating molecules used for the next iteration are unblocked. Theoperon that lacks the repressor appropriately corresponding to thecurrent state and has a promoter for the current input signal is thenable to proceed to transcribe.

A person having ordinary skill in the art will know how to create abi-stable switch using proteins that serve as transcription factors orrepressors having DNA-binding domains. A “transcription factor” is aprotein that binds near the beginning of the coding sequence(transcription start site) for a gene or functional mRNA. Transcriptionfactors are necessary for recruiting polymerase to transcribe DNA.Techniques for creating and using bi-stable molecular switches aredescribed in Lebar, et al., A bistable genetic switch based ondesignable DNA-binding domains, 5 Nature Communications 5007 (2014).

One implementation of bi-stable switches is shown below in Table 2.

TABLE 2 Bi-stable switch constructs Prev State Next Bit Next State PrevSymbol Next Cut Next Insert Genetic Construct R0 0 R1 P0 XX | WWX{circumflex over ( )}X | W{circumflex over ( )}W   XYYX | WYYW R_P_R01P10 X{circumflex over ( )}XT XYYX W{circumflex over ( )}WT WYYW R1 0 R0P0 YY | ZZ   Y{circumflex over ( )}Y | Z{circumflex over ( )}Z   YXXY |ZXXZ R_P_R10 P10 Y{circumflex over ( )}YT YXXY Z{circumflex over ( )}ZTZXXZ R0 1 R1 P0 XX | WW X{circumflex over ( )}X | W{circumflex over( )}W   XZZX | WZZW R_P_R01 P10 X{circumflex over ( )}XT XZZXW{circumflex over ( )}WT WZZW R1 1 R0 P0 YY | ZZ   Y{circumflex over( )}Y | Z{circumflex over ( )}Z   YWWY | ZWWZ R_P_R10 P10 Y{circumflexover ( )}YT YWWY Z{circumflex over ( )}ZT ZWWYZ

Repressor binding sites are represented by R_. Repressors bind in R1 butnot in R0. Repressors blocks promoter. The promoter binding site isrepresented by P_. Promoter may be inducible with an additional signalthat turns writing on/off completely. The agent that will toggle P fromon to off is represented as P10. Depending on the specificimplementation, P10 may be DNA, mRNA, or Protein. The agent that willtoggle R from off to on is represented as R01. The agent that willtoggle R from on to off is represented as R10. Depending on the specificimplementation, both R01 and R10 may be DNA, mRNA, or Protein. Theenzyme that cuts between X₁X₂ (or other target site) is represented byX{circumflex over ( )}X (Y{circumflex over ( )}Y or Z{circumflex over( )}Z). If the enzyme is the CRISPR/Cas9 system X{circumflex over ( )}Xrepresents the DNA/mRNA for gRNA that includes an X{circumflex over( )}X spacer. T represents tracrRNA that binds to Cas9.

The main difference from the previous chart is that there is an alphabetof four (X, Y, Z, W), but only need one previous state variable. Thisimplementation may provide some efficiencies for encoding andengineering.

The current state is used to identify which enzyme 104, 200 isappropriate for creating a DSB in the dsDNA 102. The enzyme 104, 200 isbrought into interaction with the dsDNA 100. For example, when thecurrent state is X₁X₂ the selected enzyme is able to create a DSB in theX₁X₂ sequence. The following example in Table 3 shows how the binaryvalues 010101 may be encoded in dsDNA 102 with the initial targetsequence 102 of X₁X₂.

TABLE 3 Example of encoding binary values in DNA DNA Sequence EncodedBinary Value . . . X₁X₂ . . . no value by itself . . . X₁Y₁Y₂X₂ . . . XY= 0; 0 . . . X₁Y₁Z₁Z₂Y₂X₂ . . . XY = 0; YZ = 1; 01 . . .X₁Y₁Z₁X₁X₂Z₂Y₂X₂ . . . XY = 0; YZ = 1; ZX = 0; 010 . . .X₁Y₁Z₁X₁Z₁Z₂X₂Z₂Y₂X₂ . . . XY = 0; YZ = 1; ZX = 0; XZ = 1; 0101 . . .X₁Y₁Z₁X₁Z₁X₁X₂Z₂X₂Z₂Y₂X₂ . . . . . . ZX = 0; XZ = 1; ZX = 0; 01010 . . .X₁Y₁Z₁X₁Z₁X₁Z₁Z₂X₂Z₂X₂Z₂Y₂X₂ . . . . . . XZ = 1; ZX = 0; XZ = 1; 010101

In one example implementation, using Cas9 with a PAM sequence ofNNNNGATTT as the enzyme, the three target sites may be

X₁ = TAGCCGTATCGAGCATCGATG|CGCNNNNGATT = X₂Y₁ = GATCGATGGACTCTGCATCTA|TCGNNNNGATT = Y₂Z₁ = CGGGACGATCGATCGGGCTAG|ACTNNNNGATT = Z₂

Where the PAM sequence is indicated by bold, X₁ is (SEQ ID NO: 1), X₂ is(SEQ ID NO: 2), Y₁ is (SEQ ID NO: 3), Y₂ is (SEQ ID NO: 4), Z₁ is (SEQID NO: 5), and Z₂ is (SEQ ID NO: 6). Note that each of X₁, Y₁, and Z₁are 21 bp long.

Each of the target sites is recognized by a corresponding guide ssDNAthat cuts the dsDNA at the location indicated by the “{circumflex over( )}” below. They should have a trans-activating crRNA (tracrRNA) thatis a small trans-encoded RNA for attaching to Cas9 appended to the end.The respective ssDNA sequences are:

(SEQ ID NO: 1) gX₁ = TAGCCGTATCGAGCATCGATG{circumflex over ( )}CGC(SEQ ID NO: 3) gY₁ = GATCGATGGACTCTGCATCTA{circumflex over ( )}TCG(SEQ ID NO: 5) gZ₁ = CGGGACGATCGATCGGGCTAG{circumflex over ( )}ACT

Then a homology directed repair sequences of X₁Y₁Y₂X₂ is:TAGCCGTATCGAGCATCGATG|GATCGATGGACTCTGCATCTA|TCGNNNNGATT|CGCNNNNGATT (SEQID NO: 7) and homology directed repair sequences of Y₁X₁X₂Y₂ is:GATCGATGGACTCTGCATCTA|TAGCCGTATCGAGCATCGATG|CGCNNNNGATT|TCGNNNNGATT (SEQID NO: 8). Other homology directed repair sequences can be designedaccording to the same pattern.

An initial cut of the target site X₁X₂ will create a DSB that appears as(only one strand of the dsDNA is shown): . . . TAGCCGTATCGAGCATCGATGCGCNNNNGATT . . . (SEQ ID NOs: 1 and 2) After HDR with X₁Y₁Y₂X₂, onestrand of the dsDNA will have the following sequence which now includesthe target site Y₁Y₂ indicated by italics:

(SEQ ID NO: 7) . . . TAGCCGTATCGAGCATCGATG|GATCGATGGACT CTGCATCTA||TCG 

|CGCNNNNGATT . . . .

The dsDNA is now able to be cut by a Cas9 that has gY₁ creating a DSB atthe location represented by “∥”. HDR may be performed with Y₁X₁X₂Y₂, forexample, further adding to the dsDNA and completing another iteration ofencoding. This may be continued with various sequences of cuts andhomologous repair templates to encode any series of bits.

Illustrative Processes

For ease of understanding, the processes discussed in this disclosureare delineated as separate operations represented as independent blocks.However, these separately delineated operations should not be construedas necessarily order dependent in their performance. The order in whichthe process is described is not intended to be construed as alimitation, and any number of the described process blocks may becombined in any order to implement the process, or an alternate process.Moreover, it is also possible that one or more of the providedoperations may be modified or omitted.

FIG. 3 shows process 300 for iteratively adding DNA to the site of a DSBin a dsDNA molecule. The process 300 corresponds in part to theschematics shown in FIGS. 1 and 2.

At 302, a first DSB is created at a first target site 102 in the dsDNA100 with a first enzyme 104. In order to limit where the first enzyme104 cuts the dsDNA 100, the first target site 102 may be unique in thedsDNA at the time of making the first DSB. The first target site 102 mayalso be unique across a population of dsDNA that is available for thefirst enzyme to act on. For example, if there are multiple circulardsDNA molecules within a cell, the first target site 102 may exist onlyonce within the entire population of circular dsDNA molecules.Alternatively, the first target site 102 may be unique per dsDNAmolecule, but the first enzyme 104 may have access to multiple differentdsDNA molecules each including one instance of the first target site102. It is understood by persons having ordinary skill in the art thatthe enzyme (even if referred to in the singular herein) may include aplurality of individual and equivalent enzyme molecules. In someimplementations, the first target site 102 may include a firstsubsequence 102(A) that is repeated once resulting in a secondsubsequence 102(B) that is the same as the first subsequence 102(A). Forexample, if the first subsequence 102(A) is GTACTA then the secondsubsequence 102(B) is the same and the sequence of the target site 102is GTACTAGTACTA (SEQ ID NO: 9).

The first enzyme 104 may be any of the illustrative types of enzymesidentified above such as a restriction enzyme, HE, a CRISPER/Cas system,a TALEN, or a zinc finger.

At 304, a first homologous repair template 106 to encode a first binarydigit 118 is selected. Given the binary digit 118 to encode according toan encoding scheme, the identity of the first target site 102 is used toselect the first homologous repair template 106. The homologous repairtemplate 106 may include a 3′-end sequence and a 5′-end sequence eachencoding a second subsequence that is complementary to the firstsubsequence 102(A), 102(B) in the first target site 102. Thus, in thisimplementation the 3′-end sequence 108 and the 5′-end sequence 110 havethe same sequence, but in other implementations they may have differentsequences. The first homologous repair template 106 may also include amiddle portion that includes two adjacent instances of a thirdsubsequence 112(A), 112(B) that forms the next target site afterinsertion into the dsDNA.

As mentioned above, selecting the homologous repair template 106 may bebased on an encoding scheme such as that shown in Table 1 above.

At 306, it is determined if the next digit is 0. Being a binary systemthis is functionally equivalent to determining if the next digit isnot 1. If the next digit to be encoded is a 0, the process 300 proceedsto 308 and selects a homologous repair template 106 that when insertedinto the target site 102 encodes a 0. If the next binary digit is not 0,thus being 1, process 300 proceeds to 310 and selects a homologousrepair template 106 that when inserted into the target site 102 encodesa 1. The appropriate binary digit 118 is represented by the partialsequence 112(A) of the target site 102 followed by the partial sequence116(A) of the homologous repair template 106.

At 312 the homologous repair template 106 is generated. The homologousrepair template 106 may be generated at 312(A) artificially by anoligonucleotide synthesizer. Oligonucleotide synthesizers are discussedin more detail below.

At 312(B) the homologous repair temple 106 may be generated by a geneunder control of a regulated promoter. The mRNA gene product may beconverted to DNA through use of reverse transcriptase to create a DNAmolecule that is the final homologous repair template 106. In someimplementations the mRNA may itself serve as a homology repair templatewithout the need to convert to DNA first. Rafael Yáñez-Muñoz, See RNA torepair DNA—what next??, British Society for Gene and Cell Therapy (Jan.14, 2015) Available on-line atwww.bsgct.org/ma-to-repair-dna-what-next/; Keskin, H., Shen., Y. et al.Transcript-RNA-templated DNA recombination and repair. 515 Nature436-439 (2014); and Storici, F., Bebenek, K. et al., RNA-templated DNArepair. 447 Nature 338-341(2007).

At 314 the dsDNA 100 is contacted with the first homologous repairtemplate 106. Contacting the dsDNA 100 with the first homologous repairtemplate 106 may involve adding or moving multiple copies of the firsthomologous repair template 106 into a chamber that contains the dsDNA100 such as, for example, by a microfluidics system. In oneimplementation, contacting the dsDNA 100 involves upregulatingexpression of a gene encoding for a RNA sequence that is itself thehomologous repair template 106 or that serves as a template for creationof the homologous repair template 106. Upregulating expression of thegene may include any of activating a promoter controlling transcriptionof the gene and/or inhibiting action of a repressor.

Typically, this contacting occurs in an environment with multiple copiesof the dsDNA 100 and multiple copies of the homologous repair template106. There may be hundreds, thousands, or more copies of each of thedsDNA 100 and the homologous repair template 106. After contacting thedsDNA 100 with the homologous repair template 106 there will be a subsetof the original dsDNA 100 and homologous repair templates 106 thatsuccessfully perform homology directed repair to add the middle portionof the homologous repair template 106 to the dsDNA 100. There will alsobe a different subset of the dsDNA 100 that is not repaired or isrepaired other than by incorporating the middle portion of thehomologous repair template 106. Discussions herein may describe thedsDNA 100 as if every molecule is successfully repaired by thehomologous repair template 106, but persons of ordinary skill in the artwill understand that only a subset of a population of dsDNA moleculesare repaired in this way.

At 316, it is determined if an additional DSB is to be created in thedsDNA 100 and HDR performed with a second homologous repair template202. If yes because, for example, another binary digit is to be encodedin the dsDNA 100, process 300 proceeds along the “yes” path to 318.

At 318, the dsDNA 100 may be treated to remove or inactivate any of thefirst enzyme 104 and/or the first homologous repair template 106 thatremain. This reduces the potential for enzyme or homologous repairtemplate remaining from an earlier iteration creating an unintended cutor HDR in a later iteration. In one implementation, such as a cell-freemicrofluidics system, the dsDNA 100 may be washed to remove the firstenzyme and/or the first homologous repair template. In oneimplementation, there may be a waiting time of sufficient length forconcentration of the first enzyme or the first homologous repairtemplate to decrease below a threshold level. The threshold level may bea level at which minor or substantially no enzyme/homologous repairtemplate activity occurs. The threshold level may be identified byexperimentation. Natural degradation processes in a cell may causeenzymes and free ssDNA or ssRNA to degrade over time. The degradationspeed may be increased by addition of proteasomes or nucleases.Homologous repair templates may also be made inactive by addition ofcomplementary ssDNA sequences that anneal to the homologous repairtemplates creating dsDNA and preventing the homologous repair templatesfrom invading the dsDNA with a DSB.

If no additional DSB is to be created at 316, a sequence file ofrepaired dsDNA is generated at 320, and the sequence file is interpretedat 322 to identify encoded binary digits.

FIG. 4 shows a process 400 of adding DNA at two different target sitesin dsDNA according to two different encodings. Two different encodingsusing fully orthogonal components allow for writing two different binarystrings in the same dsDNA without cross talk. If two different encodingsare fully orthogonal, the enzymes used to make DSBs for the firstencoding will recognize target sites different from the target sitesrecognized by the enzymes used for the second encoding. Similarly, thetarget sites sequences for the first encoding will be sufficientlydifferent from the target site sequences for the second encoding suchthat there is no hybridization between homologous repair templates forone encoding and DNA flanking DSBs associated with the other encoding.Use of two different encodings also allows for encoding the same binarysequence with redundancy. Although only two different encodings aredescribed herein, it is to be understood that this technique may beequally applicable to any number of different encodings.

The process 400 is illustrated by a schematic showing dsDNAmanipulations. The accompanying schematic uses representations similarto those shown in FIG. 1 and FIG. 2. Unless described otherwise,features of FIG. 4 have the same or similar functions to visuallysimilar features in FIG. 1 and FIG. 2.

At 402, a first DSB is created in a first target site 404 in dsDNA 406with a first enzyme 408. The first enzyme 408 may be any of the type ofenzymes capable of creating DSBs in dsDNA described above. The dsDNA 406may be linear DNA or circular DNA. In one implementation, the dsDNA 406may be a circular DNA molecule present in a cell-free system.

At 410, the dsDNA 406 is contacted with the first homologous repairtemplate 412. The first homologous repair template 412 includes a 3′-endsequence that is complementary to a first portion of the first targetsite 404 and a 5′-end sequence that is complementary to a second portionof the first target site 404. Some portion of a population of the dsDNA406 that is brought into contact with a population of the firsthomologous repair templates 412 will undergo HDR and incorporate amiddle portion of the first homologous repair template into the dsDNA406. The middle portion of the homologous repair template 412 includesthe sequence that is between the 3′-end sequence and the 5′-endsequence. In a cell-free implementation, the contacting may be performedin part by a microfluidic mechanism moving the first homologous repairtemplate 412 into a same chamber as the dsDNA 406 so that the firsthomologous repair template 412 may hybridize to one strand of the dsDNA406.

Once integrated into the dsDNA 406 by the operation of DNA polymeraseand DNA ligase, a portion of the target site 404 is located adjacent tothe dsDNA 414 generated by repair of the first DSB with the firsthomologous repair template 412. The order of these two DNA sequencesencodes a first binary digit 416 according to a first encoding scheme.The DNA sequence generated by repair with the first homologous repairtemplate 412 is itself a second target site 414 according to the firstencoding scheme. Thus far, process 400 is similar to the DNAmanipulation shown in FIG. 1.

At 418, a second DSB is created in a third target site 420 with a secondenzyme 422. The third target site 420 belongs to a different encodingthan the first target site 404 or the second target site 414. Thus, toprevent cross talk between the two different encodings, the third targetsite 420 has a sequence that is different than the first target site 404and different than the second target site 416. The third target site 420does not overlap the first target site 404. Because it does not overlapthe first target site 404, the third target site 420 is at least onebase pair removed from the first target site 404. The second enzyme 422recognizes the second target site 414, but does not recognize or cut thefirst target site 404. The first enzyme 408 and the second enzyme 422may be the same type of enzyme (e.g., both CRISPER/Cas) or they may bedifferent types of enzymes (e.g., one restriction enzyme and one TALEN).

At 424, the dsDNA 406 is contacted with a second homologous repairtemplate 426. The second homologous repair template 426 also includes a3′-end sequence which is complementary to a first portion of the thirdtarget site 420 and a 5′-end sequence which is complementary to a secondportion of the third target site 420. Following repair of the second DSBwith the second homologous repair template 426, a sequence of a portionof the third target site 420 adjacent to a portion of the dsDNAgenerated by repairing the second DSB encodes a second binary digit 428according to a second encoding scheme. The DNA sequence complementary toa middle portion of the second homologous repair template 426 comprisesa fourth target site 430 according to the second encoding scheme. Thefourth target site 430 is different from the third target site 420 andalso different from either of the target sites for the first encoding(i.e., the first target site 404 and the second target site 414).

At this point a DSB has been created and repaired with a homologousrepair template according to the first encoding and a different DSB hasbeen created and repaired with a different homologous repair templateaccording to the second encoding. Thus, the dsDNA 406 includes a firstbinary digit 416 according to the first encoding and a second binarydigit 428 according to the second encoding. This process of creatingDSBs and repairing the breaks with a homologous repair template may berepeated iteratively according to either of the first or the secondencodings. In some implementations, the encodings may be alternated, orinterleaved, so that a binary digit is added according to the firstencoding then a binary digit is added according to the second encodingand so on.

Illustrative Cellular Implementation Environment

FIG. 5 shows an illustrative cell 500 that is capable of heritabilitystoring binary data. The cell 500 may be an E. coli cell, aSaccharomyces cerevisiae cell, or a cell from another single-celledorganism. It may also be a cell from a multi-cellular organism grown inculture. The cell 500 may contain a dsDNA molecule 502 that has a firsttarget site 504. The cell 500 may also contain a first enzyme 506 thatis configured to create a DSB within the first target site 504. Forexample, the first enzyme 506 may be a CRISPR/Cas system comprising agRNA 508 that includes a spacer region complementary to one strand ofthe dsDNA 502 at the first target site 504.

The cell 500 may also include a gene 510 under the control of a promoter512 and an operator 514. A promoter is a region of DNA that initiatestranscription of a particular gene. Promoters are located near thetranscription start sites of genes, on the same strand and upstream onthe DNA (towards the 5′ region of the sense strand). An operator is asegment of DNA to which a transcription factor binds to regulate geneexpression. The transcription factor is a repressor, which can bind tothe operator to prevent transcription.

The gene 510 may encode a ssRNA sequence 516 comprising a 3′-endsequence 518 and a 5′-end sequence 520. A homologous repair template 522may be generated from the gene 510. In one implementation, thehomologous repair template 522 is the ssRNA sequence 516 itself. The3′-end sequence 518 and the 5′-end sequence 520 are complementary to onestrand of the dsDNA molecule 502 over at least part of the first targetsite 504. Homology between the 3′-end sequence 518 and the 5′-endsequence 520 allows the ssRNA sequence 516 to hybridize with portions ofthe dsDNA 502 on either side of a DSB created in the first target site504.

In implementations in which the gene 510 directly encodes the homologousrepair template 522, the gene 510 will encode a target site 524 that maybe cut by an enzyme such as the first enzyme 506. Unless protected fromthe enzyme, the target site 524 in the gene 510 may be unintentionallycut when the enzyme is introduced.

One technique for protecting the target site 524 from the first enzyme506 is physical separation. In a cell-free system, such as one that usesmicrofluidics, the gene 510 may be maintained in one chamber and thessRNA sequence 516 may be moved from the chamber containing the gene 510into a different chamber where the enzyme is introduced.

Physical separation may also be used in cellular implementations. Thegene 510 and the enzyme may be contained in different cellular chambers.In one implementation, the gene 510 may be in the nucleus and the enzymemay be outside the nucleus in the cytoplasm or in another cellularchamber. The gene 510 may remain in the nucleus if it is part of thecell's genome. A nuclear export signal may be used to keep the enzymeout of the nucleus. A nuclear export signal (NES) is a short amino acidsequence of 4 hydrophobic residues in a protein that targets it forexport from the cell nucleus to the cytoplasm through the nuclear porecomplex using nuclear transport. Persons of ordinary skill in the artwill be able to modify or engineer the enzyme to include a NES.

The ssRNA sequence 516 may be exported from its site of transcription inthe nucleus to the cytoplasm or other destination outside the nucleuswhere the enzyme is present. RNA export is described in Sean Carmody andSusan Wente, mRNA Nuclear Export at a Glance, 122 J. of Cell Sci.1933-1937 (2009) and Alwin Köhler and Ed Hurt, Exporting RNA from theNucleus to the Cytoplasm, 8 Nature Reviews Molecular Cell Biology761-773 (2007).

In one implementation, the gene 510 may include a sequence with aportion that is later removed by splicing. This additional portion willprevent the enzyme from recognizing a target site 524, but the ssRNAsequence 516 will not become a homologous repair template 522 until thesplice site is removed. There are multiple types of alternativesplicing, of which the is exon skipping. Exon skipping is one way tocause splicing in the ssRNA sequence 516; in this case, an exon may bespliced out of the primary transcript. Persons having ordinary skill inthe art will understand how to design the gene 510 so that it includes asplice site at a specified location. Alternative splicing may beimplemented even if the gene 510 and enzyme are not physicallyseparated.

Self-excising elements may function similarly to splicing. The gene 510may be designed to include region that when transcribed into RNAincludes one or more self-excising elements. The self-excising elementsprevent the gene 510 from being recognized by the enzyme and theexcision converts the ssRNA sequence 516 into the homologous repairtemplate 522. One type of self-excising elements are ribozymes, whichare RNA enzymes that function as reaction catalysts. So ribozymes arejust RNA sequences that catalyze a (trans-esterification) reaction toremove itself from the rest of the RNA sequence. Essentially these areconsidered introns, which are intragenic regions spliced from mRNA toproduce mature RNA with a continuous exon (coding region) sequence.Self-excising introns/ribozymes consist of type I and II introns. Theyare considered self-splicing because they do not require proteins toinitialize the reaction. Self-excising sequences are known and one ofordinary skill in the art will understand how to include a self-excisingsequence in the gene 510. Aspects of self-excising ribozymes are shownin Internet domain 2011.igem.org/Team:Waterloo.

A series of homologous bridges may also be used to generate arecombinant sequence that is the gene template for the ssRNA sequence516. The homologous bridges may be present in the DNA at various,separate locations so that the gene 510 does not include a target site524. This technique is also known as multi-fragment cloning or extensioncloning. The final homologous repair template 522 is made up oftranscripts of the multiple overlapping segments. One suitable techniquefor combining the multiple-overlapping fragments into the homologousrepair template 522 is Sequence and Ligation-Independent Cloning (SLIC).Mamie Li and Stephen Elledge, Harnessing Homologous Recombination invitro to Generate Recombinant DNA Via SLIC, 4 Nature Methods 250-256(2007). Another suitable technique for joining multiple-overlappingfragments is provided by Jiayuan Quan and Jingdong Tian, CircularPolymerase Extension of Cloning of Complex Gene Libraries and Pathways,4(7) PLoS ONE (2009).

In one implementation, the homologous repair template 522 is a ssDNAsequence complementary to the ssRNA sequence 516. The ssDNA sequence maybe created by reverse transcriptase reading the ssRNA sequence 516 andsynthesizing a complementary ssDNA sequence. Reverse transcriptase (RT)is an enzyme used to generate complementary DNA (cDNA) from an RNAtemplate, a process termed reverse transcription. RT is widely used inthe laboratory to convert RNA to DNA for use in procedures such asmolecular cloning, RNA sequencing, polymerase chain reaction (PCR), andgenome analysis. RT enzymes are widely available from multiplecommercial sources.

The 3′-end sequence 526 and the 5′-end sequence 528 of the homologousrepair template 522 are complementary to one strand of the dsDNA 502over at least a portion of the first target site 504. The homologousrepair template 522, in both ssDNA and ssRNA implementations, includes amiddle portion 530 that, when incorporated into the dsDNA 502, forms atarget site as described above in this disclosure.

Enzyme 506 is illustrated here as a CRISPR/Cas complex with gRNA 508.Other types of enzymes discussed above may be used instead of theCRISPR/Cas complex. The single-stranded tail of the gRNA 508 may beextended with a sequence complementary to all or part of the homologousrepair template 522. The homologous repair template 522 may bind to thetail of the gRNA 508 forming a double-stranded region 532. This brings acopy of the homologous repair template 522 into close proximity with thelocation of the DSB created by the CRISPR/Cas complex 506.

The extended tail of the gRNA 508 may also be designed so that itmatches the binding domain of a transcription activator-like effector(TALE) protein. The TALE protein may also have a binding domaincomplementary to the homologous repair template 522. This will alsobring the homologous repair template into close proximity with thelocation of the DSB. The tail of the gRNA 508 may be extended to createregions for attachment of multiple copies of the homologous repairtemplate 522 or TALE proteins.

TALE proteins are proteins secreted by Xanthomonas bacteria via theirtype III secretion system when they infect various plant species. Theseproteins can bind promoter sequences in the host plant and activate theexpression of plant genes that aid bacterial infection. They recognizeplant DNA sequences through a central repeat domain consisting of avariable number of about 34 amino acid repeats. There appears to be aone-to-one correspondence between the identity of two critical aminoacids in each repeat and each DNA base in the target sequence. The mostdistinctive characteristic of TAL effectors is a central repeat domaincontaining between 1.5 and 33.5 repeats that are usually 34 residues inlength (the C-terminal repeat is generally shorter and referred to as a“half repeat”). A typical repeat sequence may be shared across many TALEproteins but the residues at the 12^(th) and 13^(th) positions arehypervariable (these two amino acids are also known as the repeatvariable diresidue or RVD). This simple correspondence between aminoacids in TAL effectors and DNA bases in their target sites makes themuseful for protein engineering applications.

Subsequent to creation of a DSB in the target site 504, the molecule 532that has hybridized to the tail of the gRNA 508 may be released. In someimplementations, introduction of a nucleotide sequence complementary tothe tail of the gRNA 508 or binding domain of the TALE protein maycompete with the homologous repair template 522 and cause disassociationof the homologous repair template 522. This competition may cause thehomologous repair template 522 to become available for binding to thedsDNA 502 on either side of the DSB.

The cell 500 may also include one or more engineered signaling pathways534. As used herein, “engineered signaling pathway” includes any pathwayin which at least one portion of the pathway is intentionally modifiedwith molecular biology techniques to be different from the wild typepathway and a signal (intracellular or extracellular) causes a change ina rate of transcription of a gene. The engineered signaling pathway 534may induce a promoter such as the promoter 512 described above. Theengineered signaling pathway 534 may also cause a transcription factorto bind to an operator such as the operator 514 described above andprevent transcription. In one implementation, the gene affected by theengineered signaling pathway 534 may be the gene 510 that encodes forthe ssRNA sequence 516. Thus, the engineered signaling pathway 534 mayfunction to control an amount of the homologous repair template 522available in the cell 500. In one implementation, the gene affected bythe engineered signaling pathway 534 may encode for an enzyme thatcreates DSBs in dsDNA such as enzyme 506. Thus, the number of enzymeswhich create DSBs in the target sequences 504 may be regulated by theengineered signaling pathway 534.

The cell 500 may include multiple different engineered signalingpathways 534 each responding to a unique signal and each promoting orrepressing expression of genes responsible for the creation of thehomologous repair templates 522 and/or enzymes 506. Thus, intracellularor extracellular signals may be used to vary the levels of homologousrepair templates 522 and/or enzymes 506 in the cell 500 thereby changingwhich target sequences 504 are cut and which sequences are used torepair DSBs through HDR. By engineering signaling pathways that affectthe availability of homologous repair templates 522 and/or enzymes 506,cells might be modified to include a control system for encodingarbitrary sequences of binary data in dsDNA 502 of the cell 500according to the encoding techniques described above.

In one implementation, the engineered signaling pathway 534 may includean external receptor 536 that can detect extracellular signals across amembrane 538. The membrane 538 may be a cell wall, lipid bilayer, orartificial cell wall. In one implementation, the external receptor 536may be a G protein-coupled receptor (GPCR). GPCRs constitute a largeprotein family of receptors, that sense molecules outside the cell 500and activate inside signal transduction pathways 534 and, ultimately,cellular responses. The GPCR is activated by an external signal in theform of a ligand or other signal mediator. This creates a conformationalchange in the GPCR, causing activation of a G protein. Further effectdepends on the type of G protein. G proteins are subsequentlyinactivated by GTPase activating proteins, known as RGS proteins. Theligands that bind and activate these GPCRs include light-sensitivecompounds, odors, pheromones, hormones, and neurotransmitters, and varyin size from small molecules to peptides to large proteins. When aligand binds to the GPCR it causes a conformational change in the GPCR,which allows it to act as a guanine nucleotide exchange factor (GEF).The GPCR can then activate an associated G protein by exchanging itsbound GDP for a GTP. The G protein's a subunit, together with the boundGTP, can then dissociate from the β and γ subunits to further affectintracellular signaling proteins or target functional proteins directlydepending on the a subunit type.

In one implementation, the external receptor 536 may be a photosensitivemembrane protein. Photoreceptor proteins are light-sensitive proteinsinvolved in the sensing and response to light in a variety of organisms.Photoreceptor proteins typically consist of a protein moiety and anon-protein photopigment that reacts to light via photoisomerization orphotoreduction, thus initiating a change of the receptor protein whichtriggers a signal transduction cascade. Pigments found in photoreceptorsinclude retinal (retinylidene proteins, for example rhodopsin inanimals), flavin (flavoproteins, for example cryptochrome in plants andanimals) and bilin (biliproteins, for example phytochrome in plants).One example of engineered use of light-sensitive proteins is found inAlvin Tamsir et al., Robust Multicellular Computing Using GeneticallyEncoded NOR Gates and Chemical ‘Wires’, 469 Nature 212-215 (2011).

The external receptor 536, in some implementations, may also be amembrane-bound immunoglobulin (mlg). A membrane-bound immunoglobulin isthe membrane-bound form of an antibody. Membrane-bound immunoglobulinsare composed of surface-bound IgD or IgM antibodies and associated Ig-αand Ig-β heterodimers, which are capable of signal transduction througha signaling pathway 534 in response to activation by an antigen.

In one implementation, temperature may activate the engineered signalingpathway 534. Thus, by altering the temperature of the cell 500, genescontrolling things such as homologous repair templates or enzymesynthesis may be up or down regulated. Temperature sensing moleculesthat naturally occur in single celled organisms include heat shockproteins and certain RNA regulatory molecules, such as riboswitches.Heat shock proteins are proteins that are involved in the cellularresponse to stress. One example of a heat shock protein that responds totemperature is the bacterial protein DnaK. Temperatures elevated abovenormal physiological range can cause DnaK expression to becomeup-regulated. DnaK and other heat shock proteins can be utilized forengineered pathways that respond to temperature. Riboswitches are a typeof RNA molecule that can respond to temperature in order to regulateprotein translation. An example of a temperature-regulated engineeredpathway that has utilized a riboswitch can be found in Neupert, J, etal., Nucleic Acids Res, 2008, 36(19):e124. Another example of atemperature-sensitive molecule that can be utilized to regulateengineered cell pathways is a temperature-sensitive mutant protein.Single mutations can be made to proteins, which cause the proteins tobecome unstable at high temperatures, yet remain functional at lowertemperatures. Methods for synthesizing temperature-sensitive mutantproteins can be found in Ben-Aroya, S, et al., Methods Enzymol, 2010,470, 181-204. An example of a temperature-controlled engineered pathwaythat utilizes a temperature-sensitive mutant can be found in Hussain, F,et al., 2014, PNAS, 11(3), 972-977.

In one implementation, ion concentration or pH may activate theengineered signaling pathway 534. With engineered signaling pathways 534of this type, placing the cell 500 in a different ionic environment oraltering pH surrounding the cell 500 may be used to control theavailability of a given homologous repair template or enzyme. Examplesof cellular sensing molecular mechanisms that detect ionic strength orpH include many viral proteins, such as herpes simplex virus gB, rubellavirus envelope protein, influenza hemagglutinin, and vesicularstomatitis virus glycoprotein. An example of a natural cellular pathwaythat is regulated by pH is penicillin production by Aspergillus nidulans(Espeso, E, et al., 1993, EMBO J, 12(10), 3947-3956). Another example ofa pH-sensitive molecule that can be utilized to regulate engineered cellpathways is a pH-sensitive mutant protein. Single mutations can be madeto proteins, which can cause the proteins to become less stable ineither acidic or basic conditions. For example, pH-sensitive antibodiescan bind to an antigen at an optimal pH, but are unable to bind to anantigen at a non-optimal pH. A technique for creating pH-sensitiveantibodies that can be used for engineered signaling pathways can befound in Schroter, C, et al., 2015, MAbs, 7(1), 138-51. These and othersimilar sensing mechanisms may be engineered to affect the behavior of apromoter 512 or operator 514

The cell 500 may also include an additional dsDNA molecule 540 that alsoincludes a target site 542. Similar to the first dsDNA molecule 502, theadditional dsDNA molecule 540 may include only a single instance of thetarget site 542. Or, the additional dsDNA molecule 540 may includemultiple copies of the same target site or multiple different targetsites.

The additional dsDNA molecule 540 and the target site 542 may haveidentical or similar sequences to the first dsDNA molecule 502 and thefirst target site 504. Thus, the additional dsDNA molecule 540 may bethought of as a “copy” of the first dsDNA molecule 502. This additionalcopy of an identical or similar molecule may provide redundancy bycreating a second string of binary data that, absent errors, will recordthe same series of 0s and 1s in both dsDNA molecules 502, 540. In oneimplementation, the additional dsDNA molecule 540 may include a targetsite 542 with a different sequence than target site 504 in the firstdsDNA molecule 502. Having different target sites 504, 542 in differentdsDNA molecules 502, 540 allows for simultaneous, or alternating,encoding of binary data in two different encoding schemes. The twodifferent encoding schemes may be non-overlapping or “orthogonal” sothat the enzymes and homologous repair templates associated with oneencoding scheme do not interact with the dsDNA molecule used for theother encoding scheme. It is understood, that in actual implementationthere may be many hundreds of thousands of dsDNA molecules withrespective target sites. There may also be a corresponding number ofdifferent encoding schemes and different sequences for the respectivetarget sites.

Illustrative System and Computing Devices

FIG. 6 shows an illustrative architecture 600 for implementing andinteracting with DNA molecules storing data introduced through HDR asdescribed above. The architecture may include any of a digital computer602, an oligonucleotide synthesizer 604, an automated system 606, and/ora polynucleotide sequencer 608. The architecture 600 may also includeother components besides those discussed herein.

As used herein, “digital computer” means a computing device including atleast one hardware microprocessor 610 and memory 612 capable of storinginformation in a binary format. The digital computer 602 may be asupercomputer, a server, a desktop computer, a notebook computer, atablet computer, a game console, a mobile computer, a smartphone, or thelike. The hardware microprocessor 610 may be implemented in any suitabletype of processor such as a single core processor, a multicoreprocessor, a central processing unit (CPU), a graphical processing unit(GPU), or the like. The memory 612 may include removable storage,non-removable storage, local storage, and/or remote storage to providestorage of computer readable instructions, data structures, programmodules, and other data. The memory 612 may be implemented ascomputer-readable media. Computer-readable media includes, at least, twotypes of media, namely computer-readable storage media andcommunications media. Computer-readable storage media includes volatileand non-volatile, removable and non-removable media implemented in anymethod or technology for storage of information such ascomputer-readable instructions, data structures, program modules, orother data. Computer-readable storage media includes, but is not limitedto, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM,digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other non-transmission medium that can be usedto store information for access by a computing device.

In contrast, communications media may embody computer readableinstructions, data structures, program modules, or other data in amodulated data signal, such as a carrier wave, or other transmissionmechanism. As defined herein, computer-readable storage media andcommunications media are mutually exclusive.

The digital computer 602 may also include one or more input/outputdevices(s) 614 such as a keyboard, a pointing device, a touchscreen, amicrophone, a camera, a display, a speaker, a printer, and the like.

A homology directed repair (HDR) template designer 616 may be includedas part of the digital computer 602, for example, as instructions storedin the memory 612. The HDR template designer 616 may design homologousrepair templates based on sequences of target sites, sequences of dsDNAmolecules, encoding schemes, enzyme recognition sites, etc. In oneimplementation, the HDR template designer 616 may design homologousrepair templates to avoid cross talk between different encoding schemes.The HDR template designer 616 may also compare percent similarity andhybridization conditions for potential homologous repair templates aswell as portions of the homologous repair templates. For example, theHDR template designer 616 may design homologous repair templates toavoid the formation of hairpins as well as to prevent or minimizeannealing between homologous repair templates. The HDR template designer616 may also design homologous repair templates to maximize a differencebetween the 3′-end sequence, 5′-end sequence, and/or middle sequence.For example, the difference may be G:C content and the HDR templatedesigner 616 may design sequences with a preference for increasing theG:C content difference between the end sequences and the middlesequence.

The digital computer 602 may also include an encoding database 618.However, the encoding database 618 may be part of a hardware device thatis physically separate from the digital computer 602. The encodingdatabase 618 includes the correspondence between a DNA sequence and abinary value. For example, the information included in Table 1 is oneexample of an encoding that may be stored in the encoding database 618.The encoding database 618 may store any number of different encodings.The encoding database 618 may be referenced by the HDR template designer616 to determine which HDR template is to be used next in order toencode a given bit. Similarly, a decoder 620 may also reference theencoding database 618 to determine what bit value to assign to a givensequence. The decoder 620 may be implemented as instructions stored inthe memory 612. Thus, sequence data 622 may be provided to the decoder620 which converts the sequence data 622 into binary data 624.

The decoder 620 may also provide error correction. In implementations inwhich binary data 624 is redundantly encoded either by writing with asame encoding scheme in multiple target sites or by writing the samebinary data 624 in multiple encoding schemes, the decoder 620 mayidentify the most probable sequence out of multiple different sequencesall representing the same binary data 624. Therefore, areas which may bepresent in some of the sequences will be corrected if a majority of thesequences include the correct value for a given bit.

The decoder 620 may provide error correction through other techniquessuch as the use of parity bits and block codes. A parity bit is bitadded to a string of binary code that indicates whether the number of1-bits in the string is even or odd. Parity bits are simple forms oferror detecting code. In the case of even parity, for a given set ofbits, the occurrences of bits whose value is 1 is counted. If that countis odd, the parity bit value is set to 1, making the total count ofoccurrences of 1's in the whole set (including the parity bit) an evennumber. If the count of 1's in a given set of bits is already even, theparity bit's value is 0. In the case of odd parity, the coding isreversed. For a given set of bits, if the count of bits with a value of1 is even, the parity bit value is set to 1 making the total count of1's in the whole set (including the parity bit) an odd number. If thecount of bits with a value of 1 is odd, the count is already odd so theparity bit's value is 0. Thus, by using homologous repair templates toadd parity bits based on the binary sequence that is intended to beencoded, reading of the binary data 624 from a DNA sequence willreproduce those parity bits and this may be used by the decoder 620 asone way of identifying an error.

There are many types of block codes such as Reed-Solomon codes, Hammingcodes, Hadamard codes, Expander codes, Golay codes, and Reed-Mullercodes. The term block code may also refer to any error-correcting codethat acts on a block of k bits of input data to produce n bits of outputdata (n, k). Block codes may be applied in the conventional manner withthe message being the intended string of binary data 624 and the noisefrom the communication channel being introduced either in theincorporation of homologous repair templates into dsDNA or in sequencingof the DNA.

Errors may be introduced in multiple ways including errors from DNApolymerases, nonspecific and unintended annealing, errors in thesynthesis of homologous repair templates, and errors in reads generatedby DNA sequencing. These types of errors are known to those of skill inthe art and may be mitigated by conventional techniques.

Some errors specific to the techniques of this disclosure includeinsertion of an incorrect homologous repair template and failure toinsert the correct homologous repair template. For example, if a DSB iscreated at a target site X₁X₂ then any homologous repair template of theformat X₁_ _X₂ may hybridize with the dsDNA on either side of the DSB.If, in order to encode the next bit, the homologous repair templateX₁Y₁Y₂X₂ should be incorporated into the dsDNA through HDR but insteadthe homologous repair template X₁Z₁Z₂X₂ is incorporated, this will leadto the dsDNA encoding an incorrect bit. The wrong homologous repairtemplate may be available for incorporation because it remains from aprevious iteration. Therefore, error rates can be reduced by removingunincorporated homologous repair templates between each iteration.Another type of error may occur when a homologous repair template shouldbe incorporated but it is not. For example, after creation of a DSB indsDNA at the two ends of the cuts DNA may rejoin each other. Thisresults in the target site that was recently cut reforming. The nextiteration may use an enzyme that does not recognize this target site.Therefore, several iterations may pass before an enzyme that recognizesthe target site is next brought into contact with the dsDNA. At thatpoint, the dsDNA will be cut and the process of incorporating homologousrepair templates will continue. But a number of homologous repairtemplates will have been missed and the final binary data 624 will omita number of bits.

In order to manipulate the DNA and potentially RNA that makes up thehomologous repair templates and dsDNA, the digital computer 602 maycommunicate with other devices through one or more I/O data interfaces626. The I/O Data interface(s) 626 can exchange instructions and datawith other devices such as the oligonucleotide synthesizer 604, theautomated system 606, the polynucleotide sequencer 608.

The oligonucleotide synthesizer 604 chemically synthesizesoligonucleotides based on instructions received as electronic data. Thesynthesized oligonucleotides may be used as homologous repair templates.Thus, in some implementations, the sequence of nucleotides which isprovided to the oligonucleotide synthesizer 604 may come from the HDRtemplate designer 616.

A number of methods for DNA synthesis and commercial oligonucleotidesynthesizers are available. Methods for DNA synthesis includesolid-phase phosphoramidite synthesis, microchip-based oligonucleotidesynthesis, ligation-mediated assembly, polymerases chain reactionPCR-mediated assembly, and the like. For example, such synthesis can beperformed using an ABI 394 DNA Synthesizer (Applied Biosystems, FosterCity, Calif.) in 0.2 μmol scale followed by standard cleavage anddeprotection protocol, e.g., using 28% aqueous ammonia or a 3:1 solutionof ammonia in methanol. One having ordinary skill in the art can selectother cleaving agents, such as methylamine, to be used instead of, or inaddition to, ammonia, if desired.

The term “oligonucleotide” as used herein is defined as a moleculeincluding two or more nucleotides. Oligonucleotides include probes andprimers. Oligonucleotides used as probes or primers may also includenucleotide analogues such as phosphorothioates, alkylphosphorothioates,peptide nucleic acids, or intercalating agents. The introduction ofthese modifications may be advantageous in order to positively influencecharacteristics such as hybridization kinetics, reversibility of thehybrid-formation, stability of the oligonucleotide molecules, and thelike.

The automated system 606 may include any type of robotics, automation,or other system for automating one or more manipulations that may beperformed on the dsDNA with the enzymes and/or the homologous repairtemplates. The automated system 606 may be used in conjunction withmanual operations such that the totality of operations needed to beperformed to practice the techniques of this disclosure are done so in ahybrid manner in which some are performed by the automated technique andothers manually.

In one implementation, the automated system 606 may include amicrofluidics system. Microfluidics is a multidisciplinary fieldintersecting engineering, physics, chemistry, biochemistry,nanotechnology, and biotechnology, with practical applications to thedesign of systems in which small volumes of fluids will be handled.Typically, fluids are moved, mixed, separated, or otherwise processed.Numerous applications employ passive fluid control techniques likecapillary forces. In some applications, external actuation isadditionally used for a directed transport of the media. Examples ofexternal actuation include rotary drives applying centrifugal forces forthe fluid transport on the passive chips. Active microfluidics refers tothe defined manipulation of the working fluid by active (micro)components such as micropumps or micro valves. Micro pumps supply fluidsin a continuous manner or are used for dosing. Micro valves determinethe flow direction or the mode of movement of pumped liquids. Oftenprocesses which are normally carried out in a lab are miniaturized on asingle chip in order to enhance efficiency and mobility as well asreducing sample and reagent volumes. As used herein, the automatedsystem 606 may include other equipment for manipulating DNA.

The automated system 606 may include a cell-free system that can beimplemented in part by microfluidics. The cell-free system may also beimplemented as an artificial cell or a minimal cell. An artificial cellor minimal cell is an engineered particle that mimics one or manyfunctions of a biological cell. Artificial cells are biological orpolymeric membranes which enclose biologically active materials. Assuch, nanoparticles, liposomes, polymersomes, microcapsules, detergentmicelles, and a number of other particles may be considered artificialcells. Micro-encapsulation allows for metabolism within the membrane,exchange of small molecules and prevention of passage of largesubstances across it. Membranes for artificial cells can be made ofsimple polymers, crosslinked proteins, lipid membranes or polymer-lipidcomplexes. Further, membranes can be engineered to present surfaceproteins such as albumin, antigens, Na/K-ATPase carriers, or pores suchas ion channels. Commonly used materials for the production of membranesinclude hydrogel polymers such as alginate, cellulose and thermoplasticpolymers such as hydroxyethyl methacrylate-methyl methacrylate(HEMA-MMA), polyacrylonitrile-polyvinyl chloride (PAN-PVC), as well asvariations of the above-mentioned materials.

Minimal cells, also known as proto-cells, are cells that help all theminimum requirements for life. Minimal cells may be created by atop-down approach that knocks out genes in a single-celled organismuntil a minimal set of genes necessary for life are identified.Mycoplasma mycoides, E. coli, and Saccharomyces cerevisiae, are examplesof organisms that may be modified to create minimal cells.

The cell-free system includes components for DNA replication and repairsuch as nucleotides, DNA polymerase, and DNA ligase. The cell freesystem will also include dsDNA that includes at least one initial targetsite for creating a DSB. The dsDNA may be present in the vector thatincludes one or more operons. The cell free system will also includebuffers to maintain pH and ion availability. Furthermore, the cell-freesystem may also include the enzymes used for creating DSBs in dsDNA andthe homologous repair templates used for repairing dsDNA. Some cell freesystems may include genes encoding the enzymes and homologous repairtemplates. To prevent enzymes from remaining when their respectivecutting functions are no longer desired, the cell free system mayinclude proteolytic enzymes that specifically break down the DNA cuttingenzymes.

In a cell free system, particular components may be added when neededeither by moving volumes of liquid together with microfluidics or byincreasing the expression of gene products that leads to synthesis ofenzymes, homologous repair templates, etc.

The automated system 606 may include a structure, such at least onechamber which holds one or more DNA molecules. The chamber may beimplemented as any type of mechanical, biological, or chemicalarrangement which holds a volume of liquid including DNA to a physicallocation. For example, a single flat surface having a droplet presentthereon, with the droplet held in part by surface tension of the liquid,even though not fully enclosed within a container, is one implementationof a chamber.

The automated system 606 may perform many types of manipulations on DNAmolecules. For example, the automated system 606 may be configured tomove a volume of liquid from the chamber to another chamber in responseto a series of instructions from the I/O data interface 626.

Microfluidics systems and methods to divide a bulk volume intopartitions include emulsification, generation of “water-in-oil”droplets, and generation of monodispersed droplets as well as usingchannels, valves, and pumps. Partitioning methods can be augmented withdroplet manipulation techniques, including electrical (e.g.,electrostatic actuation, dielectrophoresis), magnetic, thermal (e.g.,thermal Marangoni effects, thermocapillary), mechanical (e.g., surfaceacoustic waves, micropumping, peristaltic), optical (e.g.,opto-electrowetting, optical tweezers), and chemical means (e.g.,chemical gradients). In some embodiments, a droplet microactuator issupplemented with a microfluidics platform (e.g. continuous flowcomponents). Some implementations of microfluidics systems use a dropletmicroactuator. A droplet microactuator can be capable of effectingdroplet manipulation and/or operations, such as dispensing, splitting,transporting, merging, mixing, agitating, and the like.

The polynucleotide sequencer 608 may sequence DNA molecules using anytechnique for sequencing nucleic acids known to those skilled in theart. The polynucleotide sequencer 608 may be configured to sequence allor part of a dsDNA molecule modified according to any of the techniquesdescribed above and provide the sequence data 622 to the digitalcomputer 602.

DNA sequencing techniques include classic dideoxy sequencing reactions(Sanger method) using labeled terminators or primers and gel separationin slab or capillary electrophoresis. In one implementation, nextgeneration (NextGen) sequencing platforms are advantageously used in thepractice of the invention. NextGen sequencing refers to any of a numberof post-classic Sanger type sequencing methods which are capable of highthroughput, multiplex sequencing of large numbers of samplessimultaneously. Current NextGen sequencing platforms are capable ofgenerating reads from multiple distinct nucleic acids in the samesequencing run. Throughput is varied, with 100 million bases to 600 gigabases per run, and throughput is rapidly increasing due to improvementsin technology. The principle of operation of different NextGensequencing platforms is also varied and can include: sequencing bysynthesis using reversibly terminated labeled nucleotides,pyrosequencing, 454 sequencing, allele specific hybridization to alibrary of labeled oligonucleotide probes, sequencing by synthesis usingallele specific hybridization to a library of labeled clones that isfollowed by ligation, real-time monitoring of the incorporation oflabeled nucleotides during a polymerization step, polony sequencing,single molecule real-time sequencing, nanopore sequencing, and SOLiDsequencing.

454 sequencing involves two steps. In the first step, DNA is shearedinto fragments of approximately 300-800 base pairs, and the fragmentsare blunt ended. Oligonucleotide adaptors are then ligated to the endsof the fragments. The adaptors serve as primers for amplification andsequencing of the fragments. The fragments can be attached to DNAcapture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B,which contains a 5′-biotin tag. The fragments attached to the beads arePCR amplified within droplets of an oil-water emulsion. The result ismultiple copies of clonally amplified DNA fragments on each bead. In thesecond step, the beads are captured in wells (pico-liter sized).Pyrosequencing is performed on each DNA fragment in parallel. Additionof one or more nucleotides generates a light signal that is recorded bya CCD camera in a sequencing instrument. The signal strength isproportional to the number of nucleotides incorporated. Pyrosequencingmakes use of pyrophosphate (PPi) which is released upon nucleotideaddition. PPi is converted to ATP by ATP sulfurylase in the presence ofadenosine 5′ phosphosulfate. Luciferase uses ATP to convert luciferin tooxyluciferin, and this reaction generates light that is detected andanalyzed.

A sequencing technique that can be used is Helicos True Single MoleculeSequencing (tSMS). In the tSMS technique, a DNA sample is cleaved intostrands of approximately 100 to 200 nucleotides, and a polyA sequence isadded to the 3′ end of each DNA strand. Each strand is labeled by theaddition of a fluorescently labeled adenosine nucleotide. The DNAstrands are then hybridized to a flow cell, which contains millions ofoligo-T capture sites that are immobilized to the flow cell surface. Thetemplates can be at a density of about 100 million templates/cm². Theflow cell is then loaded into an instrument, e.g., HeliScope™ sequencer,and a laser illuminates the surface of the flow cell, revealing theposition of each template. A CCD camera can map the position of thetemplates on the flow cell surface. The template fluorescent label isthen cleaved and washed away. The sequencing reaction begins byintroducing a DNA polymerase and a fluorescently labeled nucleotide. Theoligo-T nucleic acid serves as a primer. The polymerase incorporates thelabeled nucleotides to the primer in a template directed manner. Thepolymerase and unincorporated nucleotides are removed. The templatesthat have directed incorporation of the fluorescently labeled nucleotideare detected by imaging the flow cell surface. After imaging, a cleavagestep removes the fluorescent label, and the process is repeated withother fluorescently labeled nucleotides until the desired read length isachieved. Sequence information is collected with each nucleotideaddition step.

Another example of a DNA sequencing technique that can be used is SOLiDtechnology (Applied Biosystems). In SOLiD sequencing, genomic DNA issheared into fragments, and adaptors are attached to the 5′ and 3′ endsof the fragments to generate a fragment library. Alternatively, internaladaptors can be introduced by ligating adaptors to the 5′ and 3′ ends ofthe fragments, circularizing the fragments, digesting the circularizedfragment to generate an internal adaptor, and attaching adaptors to the5′ and 3′ ends of the resulting fragments to generate a mate-pairedlibrary. Next, clonal bead populations are prepared in microreactorscontaining beads, primers, template, and PCR components. Following PCR,the templates are denatured and beads are enriched to separate the beadswith extended templates. Templates on the selected beads are subjectedto a 3′ modification that permits bonding to a glass slide.

Another example of a sequencing technology that can be used is SOLEXAsequencing (Illumina). SOLEXA sequencing is based on the amplificationof DNA on a solid surface using fold-back PCR and anchored primers.Genomic DNA is fragmented, and adapters are added to the 5′ and 3′ endsof the fragments. DNA fragments that are attached to the surface of flowcell channels are extended and bridge amplified. The fragments becomedouble stranded, and the double stranded molecules are denatured.Multiple cycles of the solid-phase amplification followed bydenaturation can create several million clusters of approximately 1,000copies of single-stranded DNA molecules of the same template in eachchannel of the flow cell. Primers, DNA polymerase and fourfluorophore-labeled, reversibly terminating nucleotides are used toperform sequential sequencing. After nucleotide incorporation, a laseris used to excite the fluorophores, and an image is captured and theidentity of the first base is recorded. The 3′ terminators andfluorophores from each incorporated base are removed and theincorporation, detection, and identification steps are repeated.

Another example of a sequencing technology that can be used includes thesingle molecule, real-time (SMRT™) technology of Pacific Biosciences. InSMRT, each of the four DNA bases is attached to one of four differentfluorescent dyes. These dyes are phospholinked. A single DNA polymeraseis immobilized with a single molecule of template single stranded DNA atthe bottom of a zero-mode waveguide (ZMW). A ZMW is a confinementstructure which enables observation of incorporation of a singlenucleotide by DNA polymerase against the background of fluorescentnucleotides that rapidly diffuse in and out of the ZMW (inmicroseconds). It takes several milliseconds to incorporate a nucleotideinto a growing strand. During this time, the fluorescent label isexcited and produces a fluorescent signal, and the fluorescent tag iscleaved off. Detection of the corresponding fluorescence of the dyeindicates which base was incorporated. The process is repeated.

Another example of a sequencing technique that can be used is nanoporesequencing. A nanopore is a small hole, of the order of 1 nanometer indiameter. Immersion of a nanopore in a conducting fluid and applicationof a potential across it results in a slight electrical current due toconduction of ions through the nanopore. The amount of current whichflows is sensitive to the size of the nanopore. As a DNA molecule passesthrough a nanopore, each nucleotide on the DNA molecule obstructs thenanopore to a different degree. Thus, the change in the current passingthrough the nanopore as the DNA molecule passes through the nanoporerepresents a reading of the DNA sequence.

Another example of a sequencing technique that can be used involvesusing a chemical-sensitive field effect transistor (chemFET) array tosequence DNA. In one example of the technique, DNA molecules can beplaced into reaction chambers, and the template molecules can behybridized to a sequencing primer bound to a polymerase. Incorporationof one or more triphosphates into a new nucleic acid strand at the 3′end of the sequencing primer can be detected by a change in current by achemFET. An array can have multiple chemFET sensors. In another example,single nucleic acids can be attached to beads, and the nucleic acids canbe amplified on the bead, and the individual beads can be transferred toindividual reaction chambers on a chemFET array, with each chamberhaving a chemFET sensor, and the nucleic acids can be sequenced.

Another example of a sequencing technique that can be used involvesusing an electron microscope. In one example of the technique,individual DNA molecules are labeled using metallic labels that aredistinguishable using an electron microscope. These molecules are thenstretched on a flat surface and imaged using an electron microscope tomeasure sequences.

The sequence data 622 generated by sequencing can be sent from thepolynucleotide sequencer 608 to the digital computer 602 for decoding bythe decoder 620 and also for presentation on an output device 614.

Illustrative Embodiments

The following clauses described multiple possible embodiments forimplementing the features described in this disclosure. The variousembodiments described herein are not limiting nor is every feature fromany given embodiment required to be present in another embodiment. Anytwo or more of the embodiments may be combined together unless contextclearly indicates otherwise. As used herein in this document, “or” meansand/or. For example, “A or B” means A without B, B without A, or A andB. As used herein, “comprising” means including all listed features andpotentially including addition of other features that are not listed.“Consisting essentially of” means including the listed features andthose additional features that do not materially affect the basic andnovel characteristics of the listed features. “Consisting of” means onlythe listed features to the exclusion of any feature not listed.

Clause 1. A method for encoding binary data in a double strandeddeoxyribose nucleic acid (dsDNA), the method comprising: creating afirst double strand break (DSB) at a first target site in the dsDNA witha first enzyme; selecting a first homologous repair template accordingto a first binary digit, the first target site, and an encoding scheme;and contacting the dsDNA with the first homologous repair template,wherein the first homologous repair template contains a regioncomplementary to a second target site for a second enzyme to create asecond DSB in the dsDNA and the first binary digit is encoded by anorder of a partial sequence of the first target site being adjacent to apartial sequence of the second target site.

Clause 2. The method of clause 1, wherein the first target site isunique in the dsDNA prior to the first DSB.

Clause 3. The method of any of clauses 1 or 2, wherein the first enzymecomprises a restriction enzyme, a CRISPR/Cas system, a TALEN, a zincfinger, or any other protein-protein complex, protein-RNA complex, orprotein-protein-RNA complex capable of cutting dsDNA at a specific DNAsequence.

Clause 4. The method of any clauses 1-3, wherein: a DNA sequence of thefirst target site comprises a first subsequence that is repeated once;and the first homologous repair template comprises: a 3′-end sequenceand a 5′-end sequence each encoding a second subsequence that iscomplementary to the first subsequence; and two adjacent instances of athird subsequence in the middle of the first homologous repair template.

Clause 5. The method of any of clauses 1-4, wherein the first homologousrepair template is selected from two potential homologous repairtemplates, wherein, according to the encoding scheme, DNA complementaryto a first portion of the two potential homologous repair templatesrepresents the binary digit 0 when adjacent to the partial sequence ofthe first target site and DNA complementary to a second portion of thetwo potential homologous repair templates represents the binary digit 1when adjacent to the partial sequence of the first target site.

Clause 6. The method of any of clauses 1-5, further comprisinggenerating the first homologous repair template by oligonucleotidesynthesis.

Clause 7. The method of any of clauses 1-6, further comprisinggenerating the first homologous repair template from a ribonucleic acid(RNA) sequence using reverse transcriptase.

Clause 8. The method of clause 7, wherein contacting the dsDNA with thefirst homologous repair template comprises upregulating expression of agene coding for the RNA sequence.

Clause 9. The method of any of clauses 1-8, further comprising:generating a sequence file by sequencing the dsDNA after homologousrepair of the first DSB with the homologous repair template; andinterpreting the sequence file to identify the first binary digit basedon the encoding scheme.

Clause 10. The method of any of clauses 1-9, further comprising:creating the second DSB at the second target site in the dsDNA with thesecond enzyme; selecting a second homologous repair template accordingto a second binary digit, the second target site, and the encodingscheme; and contacting the dsDNA with the second homologous repairtemplate, wherein the second homologous repair template contains a thirdtarget site for a third enzyme to create a third DSB in the dsDNA andthe second binary digit is encoded by an order of the partial sequenceof the second target site followed by a partial sequence of the thirdtarget site.

Clause 11. The method of clause 10, further comprising, after thecontacting the dsDNA with the first homologous repair template andbefore the contacting the dsDNA with the second homologous repairtemplate, at least one of: washing the dsDNA to remove the firsthomologous repair template; or waiting a length of time sufficient for aconcentration of the first homologous repair template to decrease belowa threshold level.

Clause 12. The method of any of clauses 1-11, wherein contacting thedsDNA with the first homologous repair template comprises transforming acell containing the dsDNA with the first homologous repair templatethereby introducing the dsDNA into the cell, wherein the firsthomologous repair template is exogenous to the cell.

Clause 13. A cell capable of heritably storing binary data, the cellcomprising: a dsDNA molecule comprising a first target site; a firstenzyme configured to create a DSB within the first target site; a geneunder the control of a promoter and operator that encodes a RNA sequencecomprising a 3′-end sequence and a 5′-end sequence; a homologous repairtemplate generated from the gene, the homologous repair template beingeither: (i) the RNA sequence, wherein the 3′-end sequence and 5′-endsequence are complementary to one strand of the dsDNA molecule over atleast a part of the first target site, or (ii) a ssDNA sequencecomplementary to the RNA sequence, wherein the 3′-end sequence and the5′-end sequence are complementary to one strand of the dsDNA moleculeover at least a part of the first target site; and an engineeredsignaling pathway that changes a rate of transcription of the RNAsequence in response to an intracellular or extracellular signal.

Clause 14. The cell of clause 13, wherein the first enzyme is aCRISPR/Cas system comprising a guide RNA (gRNA) that includes a spacerregion complementary to one strand of the dsDNA at the first targetsite.

Clause 15. The cell of any of clauses 13 or 14, wherein the homologousrepair template comprises a region that is complementary to a secondtarget site such that repair of the DSB within the first target site bythe homologous repair template introduces the second target site intothe dsDNA molecule, the second target site used by a second enzyme tocreate a DSB within the second target site.

Clause 16. The cell of any of clauses 13-15, wherein the engineeredsignaling pathway comprises at least one of a G protein-coupledreceptor, a photoreceptor, a thermosensor, or a membrane-boundimmunoglobulin (mlg).

Clause 17. The cell of any of clauses 13-16, further comprising anadditional dsDNA molecule that comprises an additional instance of thetarget site, wherein the dsDNA molecule contains only a single instanceof the target site and the additional dsDNA molecule contains only asingle instance of the target site.

Clause 18. A method comprising: creating a first DSB in a first targetsite in a dsDNA with a first enzyme; contacting the dsDNA with a firsthomologous repair template comprising a 3′-end sequence complementary toa first portion of the first target site and a 5′-end sequencecomplementary to a second portion of the first target site, wherein aportion of the first target site adjacent to a portion of dsDNAgenerated by repair of the first DSB encodes a first binary digitaccording to a first encoding scheme and a DNA sequence complementary toa middle portion of the first homologous repair template comprises asecond target site according to the first encoding scheme; creating asecond DSB in a third target site in the dsDNA with a second enzyme, thethird target site having a different sequence than the first target siteand being at least one base pair (bp) removed from the first targetsite; and contacting the dsDNA with a second homologous repair templatecomprising a 3′-end sequence complementary to a first portion of thethird target site and a 5′-end sequence complementary to a secondportion of the third target site, wherein a sequence of a portion of thethird target site adjacent to a portion of the dsDNA generated byrepairing the second DSB encodes a second binary digit according to asecond encoding scheme and a DNA sequence complementary to a middleportion of the second homologous repair template comprises a fourthtarget site according to the second encoding scheme.

Clause 19. The method of clause 18, wherein the first encoding scheme isorthogonal to the second encoding scheme such that target sitesbelonging to the first target site and second target site are bothdifferent than either the third target site or the fourth target site.

Clause 20. The method of any of clauses 18 or 19, wherein the method isimplemented at least in part in a cell-free system, the dsDNA is acircular dsDNA molecule, and the contacting the dsDNA with the firsthomologous repair template comprises activating a microfluidic mechanismto move the first homologous repair template into a same chamber as thedsDNA.

CONCLUSION

The terms “a,” “an,” “the” and similar referents used in the context ofdescribing the invention (especially in the context of the followingclaims) are to be construed to cover both the singular and the plural,unless otherwise indicated herein or clearly contradicted by context.

All methods described herein can be performed in any suitable orderunless otherwise indicated herein or otherwise clearly contradicted bycontext. The use of any and all examples, or exemplary language (e.g.,“such as”) provided herein is intended merely to better illuminate theinvention and does not pose a limitation on the scope of the inventionotherwise claimed. No language in the specification should be construedas indicating any non-claimed element essential to the practice of theinvention.

Groupings of alternative elements or embodiments of the inventiondisclosed herein are not to be construed as limitations. Each groupmember may be referred to and claimed individually or in any combinationwith other members of the group or other elements found herein. It isanticipated that one or more members of a group may be included in, ordeleted from, a group for reasons of convenience and/or patentability.When any such inclusion or deletion occurs, the specification is deemedto contain the group as modified, thus fulfilling the writtendescription of all Markush groups used in the appended claims.

Certain embodiments are described herein, including the best mode knownto the inventors for carrying out the invention. Of course, variationson these described embodiments will become apparent to those of ordinaryskill in the art upon reading the foregoing description. Skilledartisans will know how to employ such variations as appropriate, and theembodiments disclosed herein may be practiced otherwise thanspecifically described. Accordingly, all modifications and equivalentsof the subject matter recited in the claims appended hereto are includedwithin the scope of this disclosure. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the invention unless otherwise indicated herein orotherwise clearly contradicted by context.

Furthermore, references have been made to publications, patents and/orpatent applications (collectively “references”) throughout thisspecification. Each of the cited references is individually incorporatedherein by reference for their particular cited teachings as well as forall that they disclose.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts are disclosed as example forms ofimplementing the claims.

1. A cell comprising: a dsDNA molecule comprising a first target site; a first enzyme configured to create a DSB within the first target site; a gene under the control of a promoter and operator that encodes a RNA sequence comprising a 3′-end sequence and a 5′-end sequence; a homologous repair template generated from the gene, the homologous repair template being either: (i) the RNA sequence, wherein the 3′-end sequence and 5′-end sequence are complementary to one strand of the dsDNA molecule over at least a part of the first target site, or (ii) a ssDNA sequence complementary to the RNA sequence, wherein the 3′-end sequence and the 5′-end sequence are complementary to one strand of the dsDNA molecule over at least a part of the first target site; and an engineered signaling pathway that changes a rate of transcription of the RNA sequence in response to an intracellular signal or an extracellular signal.
 2. The cell of claim 1, wherein the dsDNA molecule comprises genomic DNA, a plasmid, or a vector.
 3. The cell of claim 1, wherein the first target site is unique in the dsDNA molecule.
 4. The cell of claim 1, wherein the first enzyme is a CRISPR/Cas system comprising a guide RNA (gRNA) that includes a spacer region complementary to one strand of the dsDNA at the first target site.
 5. The cell of claim 1, wherein the first enzyme is a restriction enzyme that recognizes a recognition sequence within the dsDNA molecule or a TALEN having a DNA binding domain that recognizes the first target site.
 6. The cell of claim 1, wherein the first enzyme is a homing endonuclease, a zinc finger, or any other protein-protein complex, protein-RNA complex, or protein-protein-RNA complex capable of cutting dsDNA at a specific DNA sequence within the target site.
 7. The cell of claim 1, wherein the homologous repair template comprises a region that is complementary to a second target site such that repair of the DSB within the first target site by the homologous repair template introduces the second target site into the dsDNA molecule, wherein the second target site is recognized by a second enzyme configured to create a second DSB within the second target site.
 8. The cell of claim 7, wherein the first target site comprises a first subsequence that is repeated once, the 3′-end sequence of the homologous repair template and the 5′-end sequence of the homologous repair template each encode a second subsequence that is complementary to the first subsequence, and the second target site comprises two adjacent instances of a third subsequence located in the homologous repair template between the 3′-end sequence and the 5′-end sequence.
 9. The cell of claim 7, wherein the second enzyme is a different type of enzyme than the first enzyme.
 10. The cell of claim 1, wherein the engineered signaling pathway comprises at least one of a G protein-coupled receptor, a photoreceptor, a thermosensor, or a membrane-bound immunoglobulin (mlg).
 11. The cell of claim 1, wherein the engineered signaling pathway changes the rate of transcription of the RNA sequence in response to an intracellular signal and the intracellular signal comprises a transcription factor that activates the promoter.
 12. The cell of claim 1, wherein the engineered signaling pathway changes the rate of transcription of the RNA sequence in response to an intracellular signal and the intracellular signal comprises a transcription factor that binds to the operator.
 13. The cell of claim 1, wherein the engineered signaling pathway changes the rate of transcription of the RNA sequence in response to an extracellular signal and the extracellular signal comprises a ligand, a signal mediator, light, an antigen, temperature, or ion concentration.
 14. The cell of claim 1, wherein homologous repair template is (i) the RNA sequence, wherein the 3′-end sequence and 5′-end sequence are complementary to one strand of the dsDNA molecule over at least a part of the first target site.
 15. The cell of claim 1, wherein homologous repair template is (i) the ssDNA sequence complementary to the RNA sequence, wherein the 3′-end sequence and the 5′-end sequence are complementary to one strand of the dsDNA molecule over at least a part of the first target site.
 16. The cell of claim 15, wherein the ssDNA sequence complementary to the RNA sequence is a reverse transcriptase product of the RNA sequence.
 17. The cell of claim 1, further comprising an additional dsDNA molecule that also includes the first target site, wherein the dsDNA molecule contains only a single instance of the first target site and the additional dsDNA molecule contains only a single instance of the first target site.
 18. The cell of claim 1, further comprising an additional dsDNA molecule that comprises a second target site with a DNA sequence that is different than the first target site.
 19. The cell of claim 1, further comprising a second engineered signaling pathway that changes a rate of transcription of a gene the encodes the first enzyme in response to a second intracellular signal or a second extracellular signal.
 20. The cell of claim 19, wherein the intracellular signal or the extracellular signal is different than the second intracellular signal or the second extracellular signal. 